Summary
When calling get_source() or similar methods that return a SourceResponse, the configuration field is incorrectly deserialized to SourceAirtable regardless of the actual source type. This prevents users from accessing source-specific configuration fields like bucket, streams, or globs for S3 sources.
Root Cause
The SourceConfiguration type is a Python Union of 500+ source configuration types. When deserializing JSON responses:
dataclasses_json tries each type in the Union in order until one succeeds
SourceAirtable appears early in the Union (position 8) and has all optional fields
- The
@dataclass_json(undefined=Undefined.EXCLUDE) decorator ignores unknown fields
- Since
SourceAirtable has no required fields, it successfully deserializes ANY JSON payload
The OpenAPI spec uses oneOf without a discriminator field, so there is no way for the SDK to determine the correct type to deserialize to.
Reproduction
from airbyte_api import AirbyteAPI
from airbyte_api.models import Security
client = AirbyteAPI(security=Security(bearer_auth="..."))
# Get an S3 source
response = client.sources.get_source(source_id="1f5ca207-5c40-48d6-b9d1-6667de9fe427")
source = response.source_response
print(f"Source type: {source.source_type}") # "s3"
print(f"Config type: {type(source.configuration).__name__}") # "SourceAirtable" (WRONG!)
print(f"Has bucket? {hasattr(source.configuration, 'bucket')}") # False (WRONG!)
Expected Behavior
The configuration field should be deserialized to SourceS3 when source_type is "s3".
Workaround
Users can access the raw configuration dict or manually deserialize to the correct type:
import json
from airbyte_api import utils
from airbyte_api.models import SourceS3
# Option 1: Access raw config dict
raw_config = response.raw_response.json()["configuration"]
print(raw_config["bucket"]) # Works
print(raw_config["streams"][0]["globs"]) # Works
# Option 2: Manually deserialize to correct type
config_json = json.dumps(response.raw_response.json()["configuration"])
s3_config = utils.unmarshal_json(config_json, SourceS3)
print(s3_config.bucket) # Works
print(s3_config.streams[0].globs) # Works
Potential Fix
A proper fix would require either:
- Adding a
discriminator field to the OpenAPI spec for SourceConfiguration using the sourceType property
- Modifying Speakeasy's generation to handle discriminated unions based on a sibling field
Since this is a generated SDK, any direct code changes would be overwritten on regeneration.
Context
This issue was reported by a customer trying to modify streams.globs at runtime from Airflow. Investigation requested by Ilja Herdt (Airbyte) (@iherdt-airbyte).
Related: Commit 87f7e7ba removed some discriminators from the OpenAPI spec in September 2024.
Summary
When calling
get_source()or similar methods that return aSourceResponse, theconfigurationfield is incorrectly deserialized toSourceAirtableregardless of the actual source type. This prevents users from accessing source-specific configuration fields likebucket,streams, orglobsfor S3 sources.Root Cause
The
SourceConfigurationtype is a PythonUnionof 500+ source configuration types. When deserializing JSON responses:dataclasses_jsontries each type in the Union in order until one succeedsSourceAirtableappears early in the Union (position 8) and has all optional fields@dataclass_json(undefined=Undefined.EXCLUDE)decorator ignores unknown fieldsSourceAirtablehas no required fields, it successfully deserializes ANY JSON payloadThe OpenAPI spec uses
oneOfwithout adiscriminatorfield, so there is no way for the SDK to determine the correct type to deserialize to.Reproduction
Expected Behavior
The
configurationfield should be deserialized toSourceS3whensource_typeis"s3".Workaround
Users can access the raw configuration dict or manually deserialize to the correct type:
Potential Fix
A proper fix would require either:
discriminatorfield to the OpenAPI spec forSourceConfigurationusing thesourceTypepropertySince this is a generated SDK, any direct code changes would be overwritten on regeneration.
Context
This issue was reported by a customer trying to modify
streams.globsat runtime from Airflow. Investigation requested by Ilja Herdt (Airbyte) (@iherdt-airbyte).Related: Commit
87f7e7baremoved some discriminators from the OpenAPI spec in September 2024.