Skip to content

Commit 5ff2a41

Browse files
authored
Merge pull request #201 from databricks-solutions/codex/alternative-install-path
Add Databricks notebook installer for Genie Workbench
2 parents 6e8db34 + 283cd9e commit 5ff2a41

26 files changed

Lines changed: 3028 additions & 404 deletions

AGENTS.md

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ cd frontend && npm run lint # ESLint
2727
./scripts/deploy.sh --update # Code-only update (faster, skips app creation)
2828
./scripts/deploy.sh --destroy # Tear down app and clean up jobs (see Gotchas for scope)
2929
./scripts/deploy.sh --destroy --auto-approve # Tear down without confirmation prompt
30+
# Databricks notebook install path: clone into Databricks Git and run notebooks/install.py
3031

3132
# Dependency management
3233
# requirements.txt is auto-generated from uv.lock — do not edit manually.
@@ -76,12 +77,15 @@ backend/
7677
scripts/
7778
install.sh # Guided first-time setup (creates .env.deploy, provisions resources)
7879
deploy.sh # Build + bundle deploy (job) + app deploy (idempotent)
80+
deploy_lib/ # Shared Python deployment library used by notebooks/install.py
7981
preflight.sh # Pre-deploy validation checks
8082
build.sh # Frontend build
8183
deploy-config.sh # Shared deploy configuration/variables
8284
grant_permissions.py # Grants required permissions for app resources
8385
setup_lakebase.py # Automates Lakebase Autoscaling project, SP role, and grants
8486
setup_synced_tables.py # Sets up GSO synced tables in Lakebase
87+
notebooks/
88+
install.py # Databricks-native installer using WorkspaceClient() and generated workspace source
8589
frontend/
8690
src/
8791
App.tsx # Root: SpaceList | SpaceDetail | AdminDashboard | CreateAgentChat
@@ -110,7 +114,7 @@ Two endpoints use `StreamingResponse` with `text/event-stream`:
110114
Frontend consumes these via manual `fetch` + `ReadableStream` in `lib/api.ts` (not EventSource). Buffer splitting on `\n\n`.
111115

112116
### Lakebase Persistence
113-
`services/lakebase.py` uses asyncpg with graceful fallback to in-memory dicts when `LAKEBASE_HOST` is not set. Supports both provisioned Lakebase and Lakebase Autoscaling — for autoscaling, uses `client.postgres.get_endpoint()` to resolve DNS and `client.postgres.generate_database_credential()` for OAuth tokens. Schema and tables are created by the app at startup via `_ensure_schema()` (the SP owns everything it creates). Lakebase project, SP role, and database-level grants (CONNECT, CREATE) are automated by `scripts/setup_lakebase.py`, called from `deploy.sh` via `uv run`.
117+
`services/lakebase.py` uses asyncpg with graceful fallback to in-memory dicts when `LAKEBASE_HOST` is not set. Supports both provisioned Lakebase and Lakebase Autoscaling — for autoscaling, uses `client.postgres.get_endpoint()` to resolve DNS and `client.postgres.generate_database_credential()` for OAuth tokens. Schema and tables are created by the app at startup via `_ensure_schema()` (the SP owns everything it creates). Lakebase project, SP role, and database-level grants (CONNECT, CREATE) are automated by `scripts/setup_lakebase.py` for the local terminal path and by `scripts.deploy_lib.lakebase` for the notebook path.
114118

115119
### LLM Calls
116120
All LLM calls go through Databricks model serving endpoints using OpenAI-compatible API. Model configured via `LLM_MODEL` env var (default: `databricks-claude-sonnet-4-6`). MLflow tracing is optional — controlled by `MLFLOW_EXPERIMENT_ID`.
@@ -135,14 +139,14 @@ Defined in `app.yaml`. Key ones:
135139
- `GSO_JOB_ID` — auto-injected by deploy script from bundle state
136140
- `GSO_WAREHOUSE_ID` — SQL warehouse for GSO queries (from app resource)
137141

138-
Deploy config uses `.env.deploy` (created by `scripts/install.sh` from `.env.deploy.template`).
142+
Local terminal deploy config uses `.env.deploy` (created by `scripts/install.sh` from `.env.deploy.template`). The Databricks notebook installer uses widgets in `notebooks/install.py` and writes a patched `app.yaml` only into the generated workspace source folder.
139143

140144
## Dev/Test Workflow
141145

142146
There is no local dev server — all testing is done by syncing code to Databricks and redeploying:
143147

144148
1. Edit code locally
145-
2. Run `./scripts/deploy.sh --update` to build, bundle deploy, and app deploy
149+
2. Run `./scripts/deploy.sh --update` for local terminal installs, or rerun `notebooks/install.py` for notebook installs
146150
3. Test in the deployed Databricks App
147151

148152
Do NOT suggest running `uvicorn` or `npm run dev` locally. The app depends on Databricks-managed resources (OBO auth, Lakebase, serving endpoints) that aren't available outside a Databricks App environment.
@@ -207,13 +211,13 @@ git add package.json package-lock.json
207211
- **Vite proxy** — dev frontend at :5173 proxies `/api` to :8000. In production, FastAPI serves static files from `frontend/dist/` directly.
208212
- **Python 3.11+** required (`pyproject.toml`). Uses `uv` for dependency management (`uv.lock` present).
209213
- **Root `package.json`** exists solely as a build hook for Databricks Apps. `postinstall` is a no-op. `build` checks for pre-built `frontend/dist/index.html` — if present (uploaded by `deploy.sh`), skips the rebuild; if dist is missing, runs `cd frontend && npm ci && npm run build`. This keeps CLI deploy fast while allowing workspace-folder deploys from fresh clones.
210-
- **Two deployment mechanisms**`deploy.sh` manages the app (create, sync, `databricks apps deploy`) while the optimization job is managed by DABs (`databricks bundle deploy -t app`). The `app` target uses `mode: development` for per-deployer Terraform state with `presets.name_prefix: ""` for clean job names (no `[dev]` prefix). Do NOT run `databricks bundle deploy -t dev` for production — it creates prefixed orphan jobs.
214+
- **Two install paths**local terminal installs use `scripts/install.sh`/`scripts/deploy.sh`, with the optimization job managed by DABs (`databricks bundle deploy -t app`). Notebook installs use `notebooks/install.py`, generate source under `/Workspace/Users/<user>/.genie-workbench-deploy/<app-name>/app`, and manage the GSO job through SDK/Jobs API reset/update semantics. Do NOT run `databricks bundle deploy -t dev` for production — it creates prefixed orphan jobs.
211215
- **Databricks CLI >= 0.297.2 required**`preflight.sh` validates this automatically.
212216
- **`--destroy` does not remove all resources** — it deletes the app and jobs but leaves behind: Lakebase data (`genie` schema), UC schema/tables (`<catalog>.genie_space_optimizer`), Genie Space SP permissions, MLflow experiments, and synced tables. Clean these up manually if needed.
213217
- **`frontend/dist/` must be explicitly uploaded** with `databricks workspace import-dir` because `databricks sync --full` only uploads non-gitignored files.
214218
- **`requirements.txt` is databricksignored** — the platform uses `uv sync` instead of `pip install`. If you see pip dependency conflicts, verify `requirements.txt` is in `.databricksignore`.
215219
- **`MLFLOW_EXPERIMENT_ID` is workspace-specific** — the app validates it at startup and silently disables tracing if the experiment doesn't exist.
216-
- **Lakebase state is app-instance scoped** — keep `GENIE_APP_NAME` stable and use `./scripts/deploy.sh --update` for normal changes. If creating a new app instance, use a fresh `GENIE_LAKEBASE_INSTANCE`; reusing an older app's Lakebase project can leave `genie` tables/sequences owned by the old app SP.
220+
- **Lakebase state is app-instance scoped** — keep the app name stable and update through the same install path (`./scripts/deploy.sh --update` locally, or rerun `notebooks/install.py`). If creating a new app instance, use a fresh Lakebase project; reusing an older app's Lakebase project can leave `genie` tables/sequences owned by the old app SP.
217221

218222
## Platform Build Strategy
219223

0 commit comments

Comments
 (0)