Skip to content

fix(ingest/oracle): make thick_mode_lib_dir work on Linux via ctypes preload#17265

Merged
askumar27 merged 1 commit intomasterfrom
oracle-thick-mode-linux-preload
May 4, 2026
Merged

fix(ingest/oracle): make thick_mode_lib_dir work on Linux via ctypes preload#17265
askumar27 merged 1 commit intomasterfrom
oracle-thick-mode-linux-preload

Conversation

@askumar27
Copy link
Copy Markdown
Contributor

📋 Summary

Make thick_mode_lib_dir actually work on Linux by preloading the Oracle Instant Client shared libraries via ctypes before calling oracledb.init_oracle_client(), so customers no longer need LD_LIBRARY_PATH / ldconfig configured on the executor host.

🎯 Motivation

A customer running Oracle ingestion in thick mode on a remote executor hits DPI-1047: Cannot locate a 64-bit Oracle Client library even after setting thick_mode_lib_dir. Today the Linux branch ignores that config entirely and relies on ldconfig / LD_LIBRARY_PATH, which is brittle on locked-down or containerised executor images.

Root cause is upstream: Oracle ships libclntsh.so on Linux without RUNPATH=$ORIGIN, so even when ODPI-C dlopens libclntsh.so by absolute path (which is what passing lib_dir does), the dynamic linker still has to resolve its DT_NEEDED deps (libnnz*, libclntshcore, libons, …) through LD_LIBRARY_PATH / ld.so.cache. Setting LD_LIBRARY_PATH from Python doesn't help — glibc reads it once at process startup. The python-oracledb maintainer confirms this in oracle/python-oracledb#578 and says it can only be fixed client-side or by patching the rpath of libclntsh.so itself.

So just forwarding lib_dir= to init_oracle_client() on Linux would change the error from cannot find libclntsh.so to cannot find libnnz.so — net zero. We have to load the dependent libs into the process address space ourselves.

🔧 Changes Overview

Modifications

  • OracleSource.__init__ now branches on platform:
    • Linux + thick_mode_lib_dir set → preload all .sos in that directory by absolute path with ctypes.CDLL(..., RTLD_GLOBAL), then call oracledb.init_oracle_client() with no args.
    • Linux + thick_mode_lib_dir unset → unchanged (rely on ldconfig / LD_LIBRARY_PATH).
    • Mac / Windows → unchanged (init_oracle_client(lib_dir=...)).
  • thick_mode_lib_dir field description updated to document that it now works on Linux.

New helper

  • _preload_oracle_client_libs(lib_dir) in oracle.py. Loads libnnz*, libclntshcore, libons, optional satellites, and libclntsh last — order matters because libclntsh has DT_NEEDED entries on the others. Surfaces a clear ConfigurationError when the directory is missing or contains no Oracle libs (much more useful than the cryptic DPI-1047 we'd otherwise hit downstream). Per-file load failures are logged at DEBUG and tolerated, since e.g. libipc1 isn't shipped in every Instant Client release.

🏗️ Architecture/Design Notes

Why ctypes.CDLL(path, RTLD_GLOBAL) instead of e.g. setting an env var?
Once an .so is mapped via dlopen with RTLD_GLOBAL, the dynamic linker registers the resolved object by SONAME in the process's symbol namespace. When libclntsh.so later dlopens its DT_NEEDED deps from inside ODPI-C, the linker sees them already resident and reuses them — completely bypassing the missing RUNPATH. This is the same trick LD preloading uses; we're just doing it explicitly from Python.

Scope of the side effect
The preload is per-process, not global:

  • It only mutates the calling Python process's address space.
  • Nothing changes on the host filesystem, /etc/ld.so.cache, env vars, sibling processes, or processes spawned later via subprocess/exec.
  • Within the process, the libs stay mapped until exit. Fine for an ingestion worker.

Trade-offs considered

  • Forwarding lib_dir to init_oracle_client() on Linux: rejected, see motivation.
  • patchelf --set-rpath '$ORIGIN' on libclntsh.so in docker/snippets/oracle_instantclient.sh: complementary, only fixes images we build. Worth doing as a follow-up for defense in depth, but doesn't help customers running their own executor images.

🧪 Testing

Added 7 unit tests in tests/unit/test_oracle_source.py:

Helper-level (_preload_oracle_client_libs):

  • Loads deps before libclntsh (preserving the order that makes the workaround actually work).
  • Tolerates missing optional libs (e.g. libipc1, libociei).
  • Raises a clear ConfigurationError for missing directory and for empty directory.
  • Continues past per-file OSError instead of aborting the whole preload.

Call-site (OracleSource.__init__):

  • Linux + thick_mode_lib_dir set → preload runs, init_oracle_client() is called without lib_dir.
  • Linux + thick_mode_lib_dir unset → preload skipped, unchanged behavior.
  • macOS + thick_mode_lib_dir set → no preload, init_oracle_client(lib_dir=...) as before.

All 51 tests in test_oracle_source.py pass; ruff and mypy clean.

📊 Impact Assessment

  • Affected Components: Oracle ingestion source only. The two changed code paths are guarded behind enable_thick_mode == True, so default (thin-mode) ingestion is completely untouched.
  • Breaking Changes: None. The Linux behavior when thick_mode_lib_dir is unset is identical to before. When it is set, the previous behavior was to silently ignore it and fall through to ldconfig resolution; we now actually honor it. Anyone who was setting thick_mode_lib_dir on Linux today was either a) already working because ldconfig was set up correctly (still works), or b) hitting DPI-1047 (now fixed).
  • Performance Impact: A handful of extra dlopen() calls at source initialization. Negligible.
  • Risk Level: Low. Purely additive for the misconfigured-Linux case; existing code paths unchanged.

🚀 Deployment Notes

  • No new dependencies — ctypes and glob are stdlib.
  • Customers experiencing DPI-1047 in thick mode on Linux can now set thick_mode_lib_dir in their recipe and have it work without touching the host's ldconfig/LD_LIBRARY_PATH.
  • Our own Docker image (docker/snippets/oracle_instantclient.sh) is unaffected — it already configures ldconfig, so the unset-thick_mode_lib_dir path keeps working.

🔗 References

  • Upstream confirmation: oracle/python-oracledb#578
  • Failure mode: Oracle DPI-1047 error
  • Related: docker/snippets/oracle_instantclient.sh (potential follow-up: patchelf --set-rpath '\$ORIGIN' libclntsh.so for defense in depth)

❓ Questions for Reviewers

  • Should we also do the patchelf change to our base image as a follow-up PR? It would be self-fixing for our images but only ours.
  • Description for thick_mode_lib_dir now applies to all three platforms — happy to split it back into platform-specific guidance if that reads better.

Made with Cursor

…preload

Oracle ships libclntsh.so on Linux without RUNPATH=$ORIGIN, so even when
python-oracledb dlopens it via an absolute path (the lib_dir branch), the
loader still resolves its DT_NEEDED deps (libnnz*, libclntshcore, libons,
...) through LD_LIBRARY_PATH / ld.so.cache. Hosts without those configured
fall over with DPI-1047. Setting LD_LIBRARY_PATH from Python doesn't help
because glibc reads it once at process startup.

Preload every .so in thick_mode_lib_dir with ctypes.CDLL(path, RTLD_GLOBAL)
before init_oracle_client(). Once the deps are mapped, the linker resolves
them by SONAME on the subsequent dlopen() inside ODPI-C. Effect is per
process; nothing escapes the worker.

See oracle/python-oracledb#578.

Made-with: Cursor
@github-actions
Copy link
Copy Markdown
Contributor

Linear: ING-2520

@github-actions github-actions Bot added the ingestion PR or Issue related to the ingestion of metadata label Apr 30, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

@maggiehays maggiehays added pending-submitter-merge and removed needs-review Label for PRs that need review from a maintainer. labels May 1, 2026
@askumar27 askumar27 merged commit 9b8e2b9 into master May 4, 2026
53 checks passed
@askumar27 askumar27 deleted the oracle-thick-mode-linux-preload branch May 4, 2026 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ingestion PR or Issue related to the ingestion of metadata pending-submitter-merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants