Skip to content

Empty string ('') as Enum8 member is not loaded into ClickHouse DB #484

@sillynuy

Description

@sillynuy

Describe the bug
I have a source table with a column of type Enum8, where one of the allowed enum values is an empty string (''). When I try to load this table into the destination using DLT, my pipeline fails with the following error:
ValueError: Invalid enum member name
To Reproduce

  1. Create a table with the following schema:
CREATE TABLE test_table (
    x Enum8('foo' = 0, 'bar' = 1, '' = 2)
) ENGINE = MergeTree
ORDER BY tuple();
  1. Insert some data into this table including the empty string enum value.
  2. Use the following DLT pipeline code to load data:
def cl_shop_client_products():
    source = sql_database(
        backend="pyarrow",
        table_names=["test_table"]
    )
    
    pipeline = dlt.pipeline(
        pipeline_name="pl_cl_test_table",
        destination='clickhouse',
        dataset_name="bronze"
    )

    info = pipeline.run(
        source,
        write_disposition="append"
    )
    print(info)
    return info
  1. Run the pipeline and observe the exception:
ValueError: Invalid enum member name: 
  File "/home/my_username/repo/my_project/dlt-dagster-project/venv/lib/python3.10/site-packages/dagster/_core/execution/plan/utils.py", line 56, in op_execution_error_boundary
    yield
  File "/home/my_username/repo/my_project/dlt-dagster-project/venv/lib/python3.10/site-packages/dagster/_utils/__init__.py", line 391, in iterate_with_context
    next_output = next(iterator)
  File "/home/my_username/repo/my_project/dlt-dagster-project/venv/lib/python3.10/site-packages/dagster/_core/execution/plan/compute_generator.py", line 129, in _coerce_op_compute_fn_to_iterator
    result = invoke_compute_fn(
  File "/home/my_username/repo/my_project/dlt-dagster-project/venv/lib/python3.10/site-packages/dagster/_core/execution/plan/compute_generator.py", line 117, in invoke_compute_fn
    return fn(context, **args_to_pass) if context_arg_provided else fn(**args_to_pass)
  File "/home/my_username/repo/my_project/dlt-dagster-project/dlt_dagster_project/assets.py", line 131, in test_table_data
    cl_test_table()
  File "/home/my_username/repo/my_project/dlt-dagster-project/dlt_dagster_project/dlt_utils/pipelines/pl_cl__test_table.py", line 38, in cl_test_table
    source = sql_database(
  File "/home/my_username/repo/my_project/dlt-dagster-project/venv/lib/python3.10/site-packages/dlt/extract/decorators.py", line 207, in call
    source = self._deco_f(*args, **kwargs)
  File "/home/my_username/repo/my_project/dlt-dagster-project/venv/lib/python3.10/site-packages/dlt/extract/decorators.py", line 293, in _wrap
    return _eval_rv(rv, schema_copy)
  File "/home/my_username/repo/my_project/dlt-dagster-project/venv/lib/python3.10/site-packages/dlt/extract/decorators.py", line 253, in _eval_rv
    _rv = list(_rv)
  File "/home/my_username/repo/my_project/dlt-dagster-project/venv/lib/python3.10/site-packages/dlt/sources/sql_database/__init__.py", line 107, in sql_database
    metadata.reflect(
  File "/home/my_username/repo/my_project/dlt-dagster-project/venv/lib/python3.10/site-packages/sqlalchemy/sql/schema.py", line 5885, in reflect
    _reflect_info = insp._get_reflection_info(
  File "/home/my_username/repo/my_project/dlt-dagster-project/venv/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 2016, in _get_reflection_info
    columns=run(
  File "/home/my_username/repo/my_project/dlt-dagster-project/venv/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 2002, in run
    res = meth(filter_names=_fn, **kw)
  File "/home/my_username/repo/my_project/dlt-dagster-project/venv/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 931, in get_multi_columns
    table_col_defs = dict(
  File "/home/my_username/repo/my_project/dlt-dagster-project/venv/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 1129, in _default_multi_reflect
    single_tbl_method(
  File "<string>", line 2, in get_columns

Expected behavior
The pipeline should correctly handle Enum8 columns with empty string members and successfully load the data into ClickHouse without raising ValueError.

Versions

Python 3.10.17
Clickhouse 25.1.1.4165
clickhouse-driver 0.2.9
dlt 1.10.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions