Fix _col=<pk> producing duplicate column in output by riteshkew · Pull Request #2774 · simonw/datasette

riteshkew · 2026-06-12T18:23:58Z

Fixes:

_col=id can cause id column to export twice in CSV export #1975

When _col=<pk_column> is passed (e.g. ?_col=id&_col=title&_col=body on a table whose pk is id), _columns_to_select starts columns with the pks and then extends it with the deduplicated _col list — but the dedup is only within _col itself, not against the pks already in columns. So id ends up in the SELECT twice:

$ /simonwillisonblog/blog_entry.csv?_col=id&_col=title&_col=body&_size=1
id,id,title,body
1,1,WaSP Phase II,...

This skips _col values that already appear in columns (i.e. the pks) when extending:

columns.extend(c for c in dict.fromkeys(_cols) if c not in columns)

Behavior notes:

Pks still appear first in the configured order; ordering of additional _col values is unchanged.
Existing _col=state&_col=state dedup-within-_col behavior is preserved.
_nocol interaction is unchanged — it still filters the final list.

Added two regression cases to test_col_nocol, both linked to the issue:

?_col=pk&_col=state → ["pk", "state"]
?_col=pk → ["pk"]

Also probed multi-pk tables (pk=[a,b]) and pk-only requests; both produce the expected dedup behavior.

Prepared with AI assistance.

When `_col=<pk_column>` is passed, the column is emitted twice in the SELECT because `_columns_to_select` starts `columns` with the pks and then extends it with the deduplicated `_col` values without re-checking against the already-added pks. Visible as: /blog_entry.csv?_col=id&_col=title&_col=body id,id,title,body 1,1,WaSP Phase II,... Skip `_col` values that are already in `columns` (the pks) when extending. The pks still appear first in the configured order; ordering of any additional `_col` values is unchanged. Added two regression cases to `test_col_nocol`: `?_col=pk` and `?_col=pk&_col=state`, both linked to the issue.

simonw · 2026-06-23T21:18:21Z

I confirmed that the new tests fail without this fix:

    async def test_col_nocol(ds_client, path, expected_columns):
        response = await ds_client.get(path + "&_extra=columns")
        assert response.status_code == 200
        columns = response.json()["columns"]
>       assert columns == expected_columns
E       AssertionError: assert ['pk', 'pk'] == ['pk']
E         
E         Left contains one more item: 'pk'
E         Use -v to get more diff

tests/test_table_api.py:1464: AssertionError
================================================================== short test summary info ==================================================================
FAILED tests/test_table_api.py::test_col_nocol[/fixtures/facetable.json?_col=pk&_col=state-expected_columns4] - AssertionError: assert ['pk', 'pk', 'state'] == ['pk', 'state']
FAILED tests/test_table_api.py::test_col_nocol[/fixtures/facetable.json?_col=pk-expected_columns5] - AssertionError: assert ['pk', 'pk'] == ['pk']

simonw · 2026-06-23T21:18:35Z

Thanks for this!

simonw added the bug label Jun 23, 2026

simonw merged commit 463eea2 into simonw:main Jun 23, 2026
19 of 20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix _col=<pk> producing duplicate column in output#2774

Fix _col=<pk> producing duplicate column in output#2774
simonw merged 1 commit into
simonw:mainfrom
riteshkew:fix-1975-col-pk-duplicate

riteshkew commented Jun 12, 2026 •

edited by simonw

Loading

Uh oh!

simonw commented Jun 23, 2026

Uh oh!

simonw commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

riteshkew commented Jun 12, 2026 • edited by simonw Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonw commented Jun 23, 2026

Uh oh!

simonw commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

riteshkew commented Jun 12, 2026 •

edited by simonw

Loading