|
| 1 | +# Docker: Node-side query and analysis |
| 2 | + |
| 3 | +This folder contains the code that runs **inside the TES container** on each TRE node. It executes the user’s analysis (SQL plus optional Python) against the node’s database and writes the result (e.g. JSON) for the client to aggregate. |
| 4 | + |
| 5 | +## Purpose |
| 6 | + |
| 7 | +- The **user query** and **analysis type** are passed in as CLI arguments. |
| 8 | +- The **analysis type** is looked up in the `LOCAL_PROCESSING_CLASSES` registry in `local_processing.py`. |
| 9 | +- Each analysis class is responsible for: |
| 10 | + - Building the SQL query (from the user query + analysis-specific logic), |
| 11 | + - Running it against the node DB, |
| 12 | + - Optional Python-side analysis on the result. |
| 13 | +- Results are written to file (e.g. JSON) and later collected and aggregated on the client side. |
| 14 | + |
| 15 | +So this code does the **per-node, partial** work; aggregation across TREs happens elsewhere (orchestrator / client). |
| 16 | + |
| 17 | +## Flow |
| 18 | + |
| 19 | +1. **Entrypoint** — Container runs `python query_resolver.py` with CLI args (`--user-query`, `--analysis`, `--db-connection` or env, `--output-filename`, `--output-format`). |
| 20 | +2. **query_resolver.py** — Parses the connection string (from env or `--db-connection`), then calls `process_query()`. |
| 21 | +3. **process_query()** — Resolves the DB connection, looks up the analysis in `LOCAL_PROCESSING_CLASSES`, instantiates the processor, builds and runs the query, runs optional Python analysis, and writes the result to disk. |
| 22 | +4. **local_processing.py** — Defines the registry and analysis classes (e.g. Mean, Variance, PMCC, ContingencyTable). Each class extends `BaseLocalProcessing` (from `local_processing_base.py`) and implements query building and optional Python analysis. |
| 23 | + |
| 24 | +## Main modules |
| 25 | + |
| 26 | +| File | Role | |
| 27 | +|------|------| |
| 28 | +| `query_resolver.py` | Click CLI, connection string parsing (`parse_connection_string`), and `process_query()` (orchestrates DB connection, registry lookup, execution, output). | |
| 29 | +| `local_processing.py` | `LOCAL_PROCESSING_CLASSES` registry and concrete analysis classes (Mean, Variance, etc.). | |
| 30 | +| `local_processing_base.py` | `BaseLocalProcessing` abstract base class (query building, optional Python analysis hook). | |
| 31 | +| `Dockerfile` | Builds the image that runs this code (Python 3.12, dependencies, entrypoint `query_resolver.py`). | |
| 32 | + |
| 33 | +## Database connection |
| 34 | + |
| 35 | +- If `--db-connection` is **not** provided, the connection string is built from environment variables: `postgresUsername`, `postgresPassword`, `postgresServer`, `postgresPort`, `postgresDatabase` (see `validate_environment()` and `parse_connection_string(None)` in `query_resolver.py`). This is the normal case when the container is launched by TES with env set by the task. |
| 36 | +- If `--db-connection` is provided, it can be a SQLAlchemy-style URL (`postgresql://...`) or a semicolon-separated key=value string (`Host=...;Username=...;...`). |
| 37 | + |
| 38 | +## Building and running |
| 39 | + |
| 40 | +From the repo root or this directory, build the image (see project docs or `tests/` for the exact image name and test usage). The container expects either postgres* env vars or `--db-connection`, plus `--user-query`, `--analysis`, and optional output options. |
| 41 | + |
| 42 | +For the **bunny**-based workflow (different image and entrypoint), see `bunny-wrapper/`. |
0 commit comments