Skip to content

Fix _col=<pk> producing duplicate column in output#2774

Merged
simonw merged 1 commit into
simonw:mainfrom
riteshkew:fix-1975-col-pk-duplicate
Jun 23, 2026
Merged

Fix _col=<pk> producing duplicate column in output#2774
simonw merged 1 commit into
simonw:mainfrom
riteshkew:fix-1975-col-pk-duplicate

Conversation

@riteshkew

@riteshkew riteshkew commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Fixes:

When _col=<pk_column> is passed (e.g. ?_col=id&_col=title&_col=body on a table whose pk is id), _columns_to_select starts columns with the pks and then extends it with the deduplicated _col list — but the dedup is only within _col itself, not against the pks already in columns. So id ends up in the SELECT twice:

$ /simonwillisonblog/blog_entry.csv?_col=id&_col=title&_col=body&_size=1
id,id,title,body
1,1,WaSP Phase II,...

This skips _col values that already appear in columns (i.e. the pks) when extending:

columns.extend(c for c in dict.fromkeys(_cols) if c not in columns)

Behavior notes:

  • Pks still appear first in the configured order; ordering of additional _col values is unchanged.
  • Existing _col=state&_col=state dedup-within-_col behavior is preserved.
  • _nocol interaction is unchanged — it still filters the final list.

Added two regression cases to test_col_nocol, both linked to the issue:

  • ?_col=pk&_col=state["pk", "state"]
  • ?_col=pk["pk"]

Also probed multi-pk tables (pk=[a,b]) and pk-only requests; both produce the expected dedup behavior.

Prepared with AI assistance.

When `_col=<pk_column>` is passed, the column is emitted twice in the SELECT
because `_columns_to_select` starts `columns` with the pks and then extends
it with the deduplicated `_col` values without re-checking against the
already-added pks. Visible as:

    /blog_entry.csv?_col=id&_col=title&_col=body
    id,id,title,body
    1,1,WaSP Phase II,...

Skip `_col` values that are already in `columns` (the pks) when extending.
The pks still appear first in the configured order; ordering of any
additional `_col` values is unchanged.

Added two regression cases to `test_col_nocol`: `?_col=pk` and
`?_col=pk&_col=state`, both linked to the issue.
@simonw simonw added the bug label Jun 23, 2026
@simonw

simonw commented Jun 23, 2026

Copy link
Copy Markdown
Owner

I confirmed that the new tests fail without this fix:

    async def test_col_nocol(ds_client, path, expected_columns):
        response = await ds_client.get(path + "&_extra=columns")
        assert response.status_code == 200
        columns = response.json()["columns"]
>       assert columns == expected_columns
E       AssertionError: assert ['pk', 'pk'] == ['pk']
E         
E         Left contains one more item: 'pk'
E         Use -v to get more diff

tests/test_table_api.py:1464: AssertionError
================================================================== short test summary info ==================================================================
FAILED tests/test_table_api.py::test_col_nocol[/fixtures/facetable.json?_col=pk&_col=state-expected_columns4] - AssertionError: assert ['pk', 'pk', 'state'] == ['pk', 'state']
FAILED tests/test_table_api.py::test_col_nocol[/fixtures/facetable.json?_col=pk-expected_columns5] - AssertionError: assert ['pk', 'pk'] == ['pk']

@simonw

simonw commented Jun 23, 2026

Copy link
Copy Markdown
Owner

Thanks for this!

@simonw simonw merged commit 463eea2 into simonw:main Jun 23, 2026
19 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants