You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(openfeature): block initialize() until RC config arrives (#16650)
## Motivation
`DataDogProvider.initialize()` returns immediately without waiting for Remote Config data. The OpenFeature SDK then emits `PROVIDER_READY` (per spec: "READY when initialize() terminates normally"), so consumers believe the provider is ready. Flag evaluations in this window silently return default values with `reason: DEFAULT` — there is no error, no indication that config hasn't loaded yet.
This was reported by a customer running a Python script (not a long-running server). On servers the bug is masked because RC config typically arrives during startup before any evaluations happen. In scripts and short-lived processes, `set_provider()` returns in 0.00s and the very next evaluation gets defaults.
Every other Datadog OpenFeature provider blocks inside `initialize()` until config arrives:
- **Java**: `CountDownLatch.await(timeout, unit)` — default 30s
- **Go**: `sync.Cond.Wait()` inside a loop — default 30s
- **Node.js**: `await initController.wait()` (deferred Promise) — default 30s
Fixes: FFL-1843
## Changes
- `DataDogProvider.__init__()` now creates a `threading.Event` (`_config_event`) used to block `initialize()` until config arrives.
- `initialize()` checks if config already exists (fast path), otherwise calls `_config_event.wait(timeout)`. If the timeout expires without config, it raises `ProviderNotReadyError` (the SDK then dispatches `PROVIDER_ERROR`).
- `on_configuration_received()` calls `_config_event.set()` to unblock `initialize()` when the first RC payload arrives. If init already timed out, it emits `PROVIDER_READY` for late recovery.
- `shutdown()` clears the event for clean re-initialization.
- New env var `DD_EXPERIMENTAL_FLAGGING_PROVIDER_INITIALIZATION_TIMEOUT_MS` (default 30000) controls the timeout. Also configurable via constructor: `DataDogProvider(init_timeout=10.0)`.
## Decisions
- **Blocking by default** (30s timeout) matches Java/Go/Node.js. The OpenFeature Python SDK only has `set_provider()` (no `set_provider_and_wait()` yet), and it calls `initialize()` synchronously on the caller's thread. So blocking here means `set_provider()` itself blocks — which is the correct default behavior for most users.
- **Timeout raises `ProviderNotReadyError`** rather than returning silently. This puts the provider in `ERROR` state (not premature `READY`), which is the same pattern Java and Node.js use on timeout.
- **Late recovery supported**: if config arrives after the timeout, `on_configuration_received()` emits `PROVIDER_READY` and the provider transitions from `ERROR` to `READY`.
- **No `init_timeout=0` async mode**: rather than adding a special non-blocking mode to the provider, async customers can wrap `set_provider()` in a background thread and listen for `PROVIDER_READY` events. A proper `set_provider_and_wait()` is being contributed upstream to the OpenFeature Python SDK ([open-feature/python-sdk#567](open-feature/python-sdk#567)).
## Testing
Verified locally using system-tests parametric tests against a patched build:
| Test | Before fix | After fix |
|---|---|---|
| `test_ffe_evaluation_immediately_after_start_without_config` | **FAILED** — `ffe_start()` returned in 0.00s, eval returned defaults | **PASSED** — blocked 30s, timed out correctly |
| `test_ffe_init_blocks_until_config_received` | PASSED | PASSED |
| `test_ffe_init_returns_real_values_not_defaults` | PASSED | PASSED |
Server log confirms blocking: `Waiting up to 30.0s for initial FFE configuration from Remote Config`
Existing FFE parametric tests: 13 passed, 0 failed (remaining errors were container resource exhaustion, not code-related).
Co-authored-by: dd-oleksii <oleksii.shmalko@datadoghq.com>
0 commit comments