Skip to content

utils: port csv_dequote.pl to Python#1728

Open
Valyrian-Code wants to merge 1 commit into
OSGeo:grass8from
Valyrian-Code:port-csv-dequote-to-python
Open

utils: port csv_dequote.pl to Python#1728
Valyrian-Code wants to merge 1 commit into
OSGeo:grass8from
Valyrian-Code:port-csv-dequote-to-python

Conversation

@Valyrian-Code

Copy link
Copy Markdown
Contributor

Summary

Fixes #620.

Replaces utils/csv_dequote.pl with a Python port (utils/csv_dequote.py). The Perl script triggered perlcritic warnings, required the Text::CSV CPAN module to run, and forced Super-Linter's Perl checks to be disabled. The Python port uses the standard-library csv module — no external dependency.

Behaviour

Matched against the Perl on a parity test (diff identical for non-multiline-record inputs):

  • USAGE message and exit code 1 on bad argv
  • Refuses to overwrite an existing output file
  • Default output filename is derived from the basename of the input (the Perl's File::Basename::fileparse quirk of dropping the directory component is preserved — subdir/x.csvx.psv in cwd)
  • Pipe-separated output with double quotes stripped

Behavioural improvement

The Python csv module correctly handles embedded newlines in quoted fields per RFC 4180. The Perl version processed input line-by-line and emitted Unable to parse line: ... errors for any multi-line record, then dropped the row entirely. Files that worked under the Perl version are unaffected; files with multi-line records that previously failed now succeed.

Test plan

  • Identical output on a CSV with quoted strings, embedded commas, escaped double quotes, empty fields, and trailing comma
  • Identical default output filename (x.csvx.psv)
  • Identical directory-dropping behaviour (subdir/x.csvx.psv in cwd)
  • Refuses to overwrite existing output (exit 1, error on stderr)
  • Reports missing input file via stderr with exit 1
  • USAGE printed and exit 1 on no args or too many args
  • Multi-line records now succeed (improvement over Perl)
  • pre-commit passes (ruff, ruff-format, flake8, editorconfig)

Other Perl scripts in the repo

src/imagery/i.pr/PRLIB/extract_ps.pl and extract_functions.pl are scoped to the i.pr module and out of scope for #620. If desired, those can be a follow-up.

cc @wenzeslaus

Copilot AI review requested due to automatic review settings June 7, 2026 17:20

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Port the legacy csv_dequote utility from Perl to Python while preserving its CLI behavior and avoiding overwrites.

Changes:

  • Added a Python implementation of csv_dequote using the standard-library csv module.
  • Removed the legacy Perl implementation.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
utils/csv_dequote.py New Python version of the CSV→PSV conversion utility.
utils/csv_dequote.pl Removes the old Perl implementation now replaced by Python.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread utils/csv_dequote.py
Comment thread utils/csv_dequote.py Outdated
Comment thread utils/csv_dequote.py Outdated
The Perl script triggered linter warnings, required the Text::CSV CPAN
module to run, and could not be checked by the repository's Super-Linter
Perl pass without disabling those checks. Replace with a Python port that
uses the standard-library csv module.

Behaviour matches the Perl on simple inputs (verified by diffing output
on a parity test):

- usage message and exit code 1 on bad argv
- refuses to overwrite an existing output file
- default output filename derived from the basename of the input,
  preserving the directory-dropping behaviour of File::Basename::fileparse
- pipe-separated output with quotes stripped

Behavioural improvement: the Python csv module correctly handles embedded
newlines in quoted fields (RFC 4180), where the Perl version processed
input line-by-line and emitted "Unable to parse line" errors for any
multi-line record. Files that worked under the Perl version are
unaffected.

Fixes OSGeo#620
@Valyrian-Code Valyrian-Code force-pushed the port-csv-dequote-to-python branch from a87dc52 to 1b850e4 Compare June 7, 2026 17:22
@Valyrian-Code

Copy link
Copy Markdown
Contributor Author

Thanks — all three valid, addressed in 1b850e4:

  1. Moved the csv.Error handling to wrap explicit next() calls on the reader, since csv.Error is raised by parsing (not by the writes that were inside the try). Switched from the for-loop to an explicit while-next() pattern so each row's parse can fail independently and we keep processing the rest of the file.

  2. Error message now includes reader.line_num and the input file name: "Unable to parse CSV at line {N} in {infile}: {exc}".

  3. Track a had_parse_error flag and return 1 if any row failed to parse, even if subsequent rows succeeded. Callers can now detect partial output via exit status.

Re-tested parity with the Perl version on the simple non-multiline-record case — output is byte-identical, exit code 0 on success.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Fix or replace csv_dequote.pl Perl script

2 participants