Skip to content

[ISSUE] TypeError in _unknown_error() when API returns unparseable error on streaming request #1264

@davwil

Description

@davwil

Description

_unknown_error() in sdk/errors/parser.py crashes with TypeError: object of type '_io.BytesIO' has no len() when the API returns an unparseable error response on a request that had a data parameter.

The bug is self-contained within the SDK: _BaseClient.do() wraps all data in BytesIO for retry/seek support (line 164-166 of _base_client.py), but _unknown_error() creates RoundTrip(raw=False), which calls _redacted_dump()
→ len(body) on the BytesIO request body. This masks the actual API error with a TypeError, making production issues impossible to diagnose.

The normal request logging path in _base_client.py:295 handles this correctly by passing raw=True when data is not None, but _unknown_error() doesn't.

Reproduction

 from unittest.mock import MagicMock

  import requests
  from databricks.sdk._base_client import _BaseClient


  def test_do_crashes_when_api_returns_unparseable_error():
      client = _BaseClient(retry_timeout_seconds=1)

      def mock_request(method, url, **kwargs):
          prep = requests.PreparedRequest()
          prep.method = method
          prep.prepare_url(url, kwargs.get("params"))
          prep.prepare_headers(kwargs.get("headers"))
          prep.prepare_body(data=kwargs.get("data"), files=kwargs.get("files"), json=kwargs.get("json"))

          response = requests.Response()
          response.status_code = 500
          response._content = b"\x00\x01 binary garbage that no parser can handle"
          response.request = prep
          return response

      client._session.request = MagicMock(side_effect=mock_request)

      # Pass plain bytes — the SDK wraps them in BytesIO (line 164-166), then crashes on it.
      client.do("PUT", "https://example.com/api/2.0/fs/files/test.json", data=b'{"key": "value"}')
      # TypeError: object of type '_io.BytesIO' has no len()


  if __name__ == "__main__":
      test_do_crashes_when_api_returns_unparseable_error()

Expected behavior
When the API returns an unparseable error, the SDK should surface the actual HTTP error (status code, response body) instead of crashing with a TypeError about BytesIO.

The fix is in _unknown_error() (parser.py:39): RoundTrip should be created with raw=True when request.body is not a str, or _redacted_dump() should handle non-string body types gracefully.

Is it a regression?
Unknown. The BytesIO wrapping in do() and the _unknown_error() fallback path appear to have been present for multiple versions. The bug surfaces only when the API returns a response that none of the standard error parsers
can handle, which may be rare in practice.

Debug Logs
the error occurs before the SDK can produce a debug log for the failed request.

Other Information

  • OS: Windows 11
  • Version: 0.67.0

Additional context
call chain:

  1. _BaseClient.do() wraps data in BytesIO (line 164-166 of _base_client.py)
  2. _perform() sends the request, calls _record_request_log() with raw=True (works fine)
  3. get_api_error() finds response.ok == False, tries all error parsers, none can parse it
  4. Falls back to _unknown_error() which creates RoundTrip(raw=False) (line 39 of parser.py)
  5. RoundTrip.generate() calls _redacted_dump("> ", request.body) (line 47 of round_trip_logger.py)
  6. _redacted_dump() calls len(body) on the BytesIO → TypeError

Also note: the error message in _unknown_error() points users to databricks-sdk-go/issues instead of databricks-sdk-py/issues.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions