utils: port csv_dequote.pl to Python#1728
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Port the legacy csv_dequote utility from Perl to Python while preserving its CLI behavior and avoiding overwrites.
Changes:
- Added a Python implementation of
csv_dequoteusing the standard-librarycsvmodule. - Removed the legacy Perl implementation.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| utils/csv_dequote.py | New Python version of the CSV→PSV conversion utility. |
| utils/csv_dequote.pl | Removes the old Perl implementation now replaced by Python. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The Perl script triggered linter warnings, required the Text::CSV CPAN module to run, and could not be checked by the repository's Super-Linter Perl pass without disabling those checks. Replace with a Python port that uses the standard-library csv module. Behaviour matches the Perl on simple inputs (verified by diffing output on a parity test): - usage message and exit code 1 on bad argv - refuses to overwrite an existing output file - default output filename derived from the basename of the input, preserving the directory-dropping behaviour of File::Basename::fileparse - pipe-separated output with quotes stripped Behavioural improvement: the Python csv module correctly handles embedded newlines in quoted fields (RFC 4180), where the Perl version processed input line-by-line and emitted "Unable to parse line" errors for any multi-line record. Files that worked under the Perl version are unaffected. Fixes OSGeo#620
a87dc52 to
1b850e4
Compare
|
Thanks — all three valid, addressed in 1b850e4:
Re-tested parity with the Perl version on the simple non-multiline-record case — output is byte-identical, exit code 0 on success. |
Summary
Fixes #620.
Replaces
utils/csv_dequote.plwith a Python port (utils/csv_dequote.py). The Perl script triggeredperlcriticwarnings, required theText::CSVCPAN module to run, and forced Super-Linter's Perl checks to be disabled. The Python port uses the standard-librarycsvmodule — no external dependency.Behaviour
Matched against the Perl on a parity test (diff identical for non-multiline-record inputs):
File::Basename::fileparsequirk of dropping the directory component is preserved —subdir/x.csv→x.psvin cwd)Behavioural improvement
The Python
csvmodule correctly handles embedded newlines in quoted fields per RFC 4180. The Perl version processed input line-by-line and emittedUnable to parse line: ...errors for any multi-line record, then dropped the row entirely. Files that worked under the Perl version are unaffected; files with multi-line records that previously failed now succeed.Test plan
x.csv→x.psv)subdir/x.csv→x.psvin cwd)Other Perl scripts in the repo
src/imagery/i.pr/PRLIB/extract_ps.plandextract_functions.plare scoped to thei.prmodule and out of scope for #620. If desired, those can be a follow-up.cc @wenzeslaus