A Terraform-managed Snowflake data lake that ingests 25,000 nested JSON banking records through a BRONZE β SILVER β GOLD medallion pipeline and surfaces a Customer 360 risk analytics dashboard via Streamlit in Snowflake.
- Overview
- Architecture
- Repository Structure
- Prerequisites
- Getting Started
- Snowflake Objects
- Streamlit Dashboard
- Dataset
- Domain Glossary
- Teardown
- License
NorthBridge Bank is a fictional retail bank whose customer data is fragmented across core banking, loan origination, CRM, and transaction processing systems. This project consolidates that data into a unified analytical layer on Snowflake, enabling:
- Customer 360 profiling across all products and channels
- Loan portfolio health monitoring and NPL tracking
- Transaction trend analysis by channel, segment, and region
- AML/KYC compliance risk scoring and flagging
All AWS and Snowflake infrastructure is provisioned via Terraform with a config-driven approach β resource names come from infra/platform/tf/config/ and are never hardcoded in .tf files.
flowchart TD
S3[("S3<br/>northbridge-raw-data/raw-data/json/")]
SQS{{"SQS event notification<br/>s3:ObjectCreated:*"}}
PIPE["Snowpipe auto-ingest<br/>RAW_NORTHBRIDGE_PIPE"]
subgraph BRONZE["BRONZE β Raw"]
RAW["RAW_NORTHBRIDGE<br/>VARIANT + audit columns<br/>25,000 records"]
end
subgraph SILVER["SILVER β Cleansed"]
CLEAN["CLEAN_NORTHBRIDGE_DT<br/>Dynamic Table β typed & cleansed"]
end
subgraph GOLD["GOLD β Curated"]
DIMS["Dimensions<br/>DIM_CUSTOMER Β· DIM_BRANCH<br/>DIM_PRODUCT Β· DIM_DATE"]
FACTS["Facts<br/>FACT_TRANSACTIONS Β· FACT_LOANS<br/>FACT_ACCOUNT_BALANCES"]
VIEWS["Views<br/>V_KPI_SUMMARY Β· V_SEGMENT_STATS<br/>V_LOAN_PORTFOLIO Β· V_MONTHLY_TXN_TRENDS<br/>V_REGIONAL_PERF Β· V_RISK_DISTRIBUTION"]
end
subgraph SERVE["STREAMLIT β Serving"]
DASH["Customer 360 Dashboard<br/>5 tabs Β· sidebar filters"]
end
S3 -->|"s3:ObjectCreated:*"| SQS
SQS --> PIPE
PIPE --> RAW
RAW -->|"Dynamic Table auto-refresh<br/>target_lag = downstream"| CLEAN
CLEAN -->|"Dynamic Tables + UDFs<br/>PROMINENT_INDEX Β· THREE_SUB_INDEX_CRITERIA Β· GET_INT"| DIMS
CLEAN --> FACTS
DIMS --> VIEWS
FACTS --> VIEWS
VIEWS -->|"Streamlit in Snowflake<br/>STREAMLIT_WH"| DASH
flowchart TD
subgraph P1["Phase 1 β AWS Resources"]
P1A["module.s3<br/>S3 bucket (landing zone)"]
P1B["module.iam_role<br/>IAM role β trust policy computed from<br/>live storage integration output;<br/>account-root placeholder on first create"]
end
subgraph P2["Phase 2 β Snowflake Resources (strict dependency order)"]
direction TB
P2A["module.warehouse<br/>LOAD_WH Β· TRANSFORM_WH Β· STREAMLIT_WH Β· ADHOC_WH"]
P2B["module.database_schemas<br/>NORTHBRIDGE_DATABASE + 4 schemas"]
P2C["module.file_formats<br/>JSON_FILE_FORMAT"]
P2D["module.storage_integrations<br/>S3_STORAGE_INTEGRATION"]
P2E["module.api_integrations<br/>GitHub API integration β account-level;<br/>currently 0 resources (see caveat below)"]
P2F["module.stage<br/>RAW_EXTERNAL_STG Β· RAW_INTERNAL_STG Β· STREAMLIT_STG"]
P2G["module.table<br/>BRONZE.RAW_NORTHBRIDGE"]
P2A --> P2B --> P2C --> P2D --> P2E --> P2F --> P2G
end
subgraph P3["Phase 3 β AWS Trust Policy Reconcile"]
P3A["module.aws_iam_role_final<br/>Re-pushes live STORAGE_AWS_IAM_USER_ARN /<br/>STORAGE_AWS_EXTERNAL_ID to the IAM role<br/>on every apply (no manual flag β fires<br/>whenever a storage integration is configured)"]
end
subgraph P4["Phase 4 β Snowpipes (BRONZE layer)"]
P4A["module.pipe<br/>RAW_NORTHBRIDGE_PIPE (auto_ingest=true)"]
P4B["module.s3_notification<br/>S3 event β SQS wiring<br/>(enable_snowpipe_creation default true;<br/>set false only on the very first apply)"]
P4A --> P4B
end
subgraph P5["Phase 5 β Dynamic Tables (SILVER + GOLD layers)"]
direction TB
P5A["module.dynamic_table<br/>SILVER.CLEAN_NORTHBRIDGE_DT"]
P5B["module.dynamic_table_gold<br/>GOLD.DIM_CUSTOMER Β· DIM_PRODUCT Β· DIM_BRANCH<br/>GOLD.FACT_TRANSACTIONS Β· FACT_LOANS Β· FACT_ACCOUNT_BALANCES"]
P5A --> P5B
end
subgraph P6["Phase 6 β Views (GOLD layer)"]
direction TB
P6A["module.views<br/>V_KPI_SUMMARY Β· V_SEGMENT_STATS Β· V_LOAN_PORTFOLIO<br/>V_MONTHLY_TXN_TRENDS Β· V_REGIONAL_PERF Β· V_RISK_DISTRIBUTION"]
P6B["snowflake_grant_privileges_to_account_role.view_grants<br/>NORTHBRIDGE_ANALYST SELECT"]
P6A --> P6B
end
P1 --> P2 --> P3 --> P4 --> P5 --> P6
Note
Streams and tasks are not used. Ingestion is handled by Snowpipe
(auto-ingest from S3); downstream refresh is handled by Dynamic Tables
with target_lag = "downstream".
[!WARNING]
Module pin caveats β known issues with the upstream modules pinned
by this root module:
| Module | Pin | Issue |
|---|---|---|
module.api_integrations |
feature/TFMOD-0001-β¦ |
Awaiting a stable tag. Currently filters out the GitHub git_https_api entry because snowflake_api_integration does not accept that provider value. The correct resource for the Streamlit-deploy use case is a Git integration (snowflake_git_repository) β not yet wired. |
module.views |
feature/TFMOD-0007-β¦ |
Awaiting a stable tag. Its versions.tf must be upgraded from Snowflake-Labs/snowflake < 1.0.0 to snowflakedb/snowflake >= 1.0.0 before terraform init will resolve. |
customer360-snowflake-pipeline/
βββ README.md
βββ CLAUDE.md # Claude Code project context
βββ CHANGELOG.md # Auto-generated by git-cliff
βββ CONTRIBUTING.md
βββ CODE_OF_CONDUCT.md
βββ LICENSE
βββ PROMPT.md # Claude Code prompt for config generation
βββ cliff.toml # git-cliff configuration
βββ env.json
βββ northbridge_customer360_architecture.svg # Architecture diagram (vector)
βββ project-architecture.jpg # Architecture diagram (raster)
β
βββ .claude/
β βββ skills/ # Project-local Claude Code skills
β βββ aws-config-iam-policies/
β βββ aws-config-s3/
β βββ aws-config-trust/
β βββ github-readme/
β βββ snowflake-config-dynamic-tables-functions/
β βββ snowflake-config-snowpipes/
β βββ snowflake-config-stages-fileformats/
β βββ snowflake-config-tables/
β βββ tf-{backend,locals,main,outputs,providers,variables,versions}/
β
βββ .devcontainer/ # Dev container definition
βββ .editorconfig
βββ .gitignore
βββ .github/
β βββ workflows/
β βββ ci.yaml # Terraform fmt / validate / security scans
β βββ terraform-deploy.yaml # Provision infra (Pass A β Pass B)
β βββ terraform-destroy.yaml # Tear down infra
β βββ deploy-streamlit-app.yaml # Refresh Streamlit app from this repo
β βββ create-branch.yaml
β βββ setup-project.yaml
β βββ notify.yaml
β
βββ app/
β βββ northbridge_dashboard.py # Streamlit in Snowflake dashboard
β
βββ data/ # Synthetic source dataset (~329 MB across 6 parts)
β βββ northbridge_part0{0..5}_of_05.json
β
βββ documentation/
β βββ NorthBridge Bank Unified Data Lake for Customer 360.pptx
β
βββ keypair/ # RSA keypair β GITIGNORED
β βββ snowflake_key.p8 # Private key β never commit
β βββ snowflake_key.pub # Public key
β
βββ post-deployment-validation/
β βββ create-gold-views.sql
β βββ pipe-validation.sql
β
βββ infra/platform/tf/ # Terraform root module
βββ main.tf # 6-phase orchestration
βββ variables.tf
βββ locals.tf
βββ outputs.tf
βββ validations.tf
βββ backend.tf # HCP Terraform remote state
βββ providers-aws.tf
βββ providers-snowflake.tf # Multiple Snowflake provider aliases
βββ versions.tf
βββ modules/
β βββ iam_role_final/ # Local module β re-pushes IAM trust at apply time
β βββ main.tf
β βββ variables.tf
β βββ outputs.tf
βββ config/ # JSON-driven resource definitions
β βββ aws/
β β βββ {devl,test,prod}/config.json
β βββ snowflake/
β βββ config.json # Default / shared config
β βββ {devl,test,prod}/config.json
βββ environments/
β βββ {devl,test,prod}/terraform.tfvars
βββ templates/
βββ bucket-policy/
β βββ s3-bucket-policy.tpl
βββ dynamic-tables/
β βββ clean_northbridge.tpl # SILVER cleansing
β βββ dim_customer.tpl # GOLD customer dimension
β βββ dim_product.tpl # GOLD product dimension
β βββ dim_branch.tpl # GOLD branch dimension
β βββ fact_transactions.tpl # GOLD transaction fact
β βββ fact_loans.tpl # GOLD loan fact
β βββ fact_account_balances.tpl # GOLD account balance fact
βββ views/
β βββ v_kpi_summary.tpl # GOLD KPI summary view
β βββ v_loan_portfolio.tpl # GOLD loan portfolio view
β βββ v_monthly_txn_trends.tpl # GOLD monthly transaction trends
β βββ v_regional_perf.tpl # GOLD regional performance
β βββ v_risk_distribution.tpl # GOLD risk distribution
β βββ v_segment_stats.tpl # GOLD customer segment stats
βββ snowpipe-copy-statements/
βββ raw_northbridge_copy.tpl
| Tool | Version | Purpose |
|---|---|---|
| Terraform | >= 1.14.1 |
Infrastructure provisioning |
| Snowflake provider | snowflakedb/snowflake >= 1.0.0 |
Snowflake resources |
| AWS provider | hashicorp/aws >= 5.0 |
AWS resources |
| OpenSSL | Any current version | RSA keypair generation |
| HCP Terraform | Account required | Remote state + CI variable sets |
| Python | >= 3.10 |
Dataset generation only |
Snowflake uses RSA keypair authentication (JWT) instead of username/password. Generate the keys and store them in infra/platform/keypair/ (gitignored):
mkdir -p infra/platform/keypair && cd infra/platform/keypair
# Step 1 β Generate 2048-bit RSA private key in PKCS#8 (unencrypted) format
openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out snowflake_key.p8 -nocrypt
# Step 2 β Derive the public key
openssl rsa -in snowflake_key.p8 -pubout -out snowflake_key.pub
# Step 3 β Extract the public key body (strip headers and newlines) for Snowflake
grep -v "BEGIN PUBLIC" snowflake_key.pub | grep -v "END PUBLIC" | tr -d '\n'
# Step 4 β Produce the HCP Terraform value for `snowflake_private_key`
# (base64-encodes the ENTIRE .p8 file β do NOT strip headers/newlines)
base64 < snowflake_key.p8 | tr -d '\n'Copy the output of Step 3 β you will paste this into the RSA_PUBLIC_KEY clause of CREATE USER in Β§2a.
Copy the output of Step 4 β you will paste this (a single line, no trailing newline) into the snowflake_private_key Terraform variable in HCP in Β§3a. Do not strip the -----BEGIN PRIVATE KEY----- / -----END PRIVATE KEY----- markers or the internal newlines from the key before encoding β base64 encodes the file as-is and the Terraform provider decodes it back into a valid PEM at runtime via base64decode() in providers-snowflake.tf.
Each Terraform module uses a dedicated, least-privilege role so that no single role has broader permissions than its scope of work. The role names below are the defaults from infra/platform/tf/variables.tf and can be overridden per environment via terraform.tfvars.
| Role | Scope | Used by Terraform module |
|---|---|---|
DB_PROVISIONER |
Create databases and schemas | module.database_schemas |
WAREHOUSE_PROVISIONER |
Create and manage warehouses | module.warehouse |
DATA_OBJECT_PROVISIONER |
Create tables, file formats, and dynamic tables | module.table, module.file_formats, module.dynamic_table |
INGEST_OBJECT_PROVISIONER |
Create storage integrations, stages, and pipes | module.storage_integrations, module.stage, module.pipe |
Run the SQL below as SECURITYADMIN / ACCOUNTADMIN. Replace YOUR_PUBLIC_KEY_HERE with the output from Step 1.3.
-- ============================================================================
-- GitHub Actions Service User + Least-Privilege Provisioner Roles
-- ============================================================================
USE ROLE SECURITYADMIN;
-- ----------------------------------------------------------------------------
-- 1) Create provisioner roles
-- ----------------------------------------------------------------------------
CREATE ROLE IF NOT EXISTS DB_PROVISIONER
COMMENT = 'Terraform: creates databases and schemas';
CREATE ROLE IF NOT EXISTS WAREHOUSE_PROVISIONER
COMMENT = 'Terraform: creates and manages warehouses';
CREATE ROLE IF NOT EXISTS DATA_OBJECT_PROVISIONER
COMMENT = 'Terraform: creates tables, file formats, and dynamic tables';
CREATE ROLE IF NOT EXISTS INGEST_OBJECT_PROVISIONER
COMMENT = 'Terraform: creates storage integrations, stages, and pipes';
-- Role hierarchy β all provisioners report to SYSADMIN for governance
GRANT ROLE DB_PROVISIONER TO ROLE SYSADMIN;
GRANT ROLE WAREHOUSE_PROVISIONER TO ROLE SYSADMIN;
GRANT ROLE DATA_OBJECT_PROVISIONER TO ROLE SYSADMIN;
GRANT ROLE INGEST_OBJECT_PROVISIONER TO ROLE SYSADMIN;
-- ----------------------------------------------------------------------------
-- 2) Grant account-level privileges (least privilege β only what each role needs)
-- ----------------------------------------------------------------------------
USE ROLE ACCOUNTADMIN;
-- DB_PROVISIONER β only allowed to create databases
GRANT CREATE DATABASE ON ACCOUNT TO ROLE DB_PROVISIONER;
-- WAREHOUSE_PROVISIONER β only allowed to create/manage warehouses
GRANT CREATE WAREHOUSE ON ACCOUNT TO ROLE WAREHOUSE_PROVISIONER;
GRANT MONITOR USAGE ON ACCOUNT TO ROLE WAREHOUSE_PROVISIONER;
-- INGEST_OBJECT_PROVISIONER β only allowed to create storage integrations
-- (stages/pipes are schema-level, granted post-database in Step 2c)
GRANT CREATE INTEGRATION ON ACCOUNT TO ROLE INGEST_OBJECT_PROVISIONER;
-- DATA_OBJECT_PROVISIONER β intentionally receives NO account-level privileges.
-- All its privileges are schema-scoped and granted in Step 2c after the
-- database/schemas are created by DB_PROVISIONER via Terraform.
-- All roles need a warehouse to run queries β the tiny UTIL_WH is fine here
GRANT USAGE ON WAREHOUSE UTIL_WH TO ROLE DB_PROVISIONER;
GRANT USAGE ON WAREHOUSE UTIL_WH TO ROLE WAREHOUSE_PROVISIONER;
GRANT USAGE ON WAREHOUSE UTIL_WH TO ROLE DATA_OBJECT_PROVISIONER;
GRANT USAGE ON WAREHOUSE UTIL_WH TO ROLE INGEST_OBJECT_PROVISIONER;
-- ----------------------------------------------------------------------------
-- 3) Create the GitHub Actions service user (keypair auth only, no password)
-- ----------------------------------------------------------------------------
USE ROLE SECURITYADMIN;
CREATE USER IF NOT EXISTS GITHUB_ACTIONS_USER
LOGIN_NAME = 'GITHUB_ACTIONS_USER'
DISPLAY_NAME = 'GitHub Actions Service User'
DEFAULT_ROLE = PUBLIC
DEFAULT_WAREHOUSE = NULL
MUST_CHANGE_PASSWORD = FALSE
DISABLED = FALSE
COMMENT = 'Service account for Terraform deployments via GitHub Actions'
RSA_PUBLIC_KEY = 'YOUR_PUBLIC_KEY_HERE';
-- ----------------------------------------------------------------------------
-- 4) Grant provisioner roles to the service user
-- ----------------------------------------------------------------------------
GRANT ROLE DB_PROVISIONER TO USER GITHUB_ACTIONS_USER;
GRANT ROLE WAREHOUSE_PROVISIONER TO USER GITHUB_ACTIONS_USER;
GRANT ROLE DATA_OBJECT_PROVISIONER TO USER GITHUB_ACTIONS_USER;
GRANT ROLE INGEST_OBJECT_PROVISIONER TO USER GITHUB_ACTIONS_USER;
-- ----------------------------------------------------------------------------
-- 5) Verify
-- ----------------------------------------------------------------------------
SHOW USERS LIKE 'GITHUB_ACTIONS_USER';
SHOW GRANTS TO USER GITHUB_ACTIONS_USER;
SHOW GRANTS TO ROLE DB_PROVISIONER;
SHOW GRANTS TO ROLE WAREHOUSE_PROVISIONER;
SHOW GRANTS TO ROLE DATA_OBJECT_PROVISIONER;
SHOW GRANTS TO ROLE INGEST_OBJECT_PROVISIONER;Run as ACCOUNTADMIN. Replace <DATABASE_NAME>, <SCHEMA_NAME>, and <ANALYST_USERNAME> as required.
-- ============================================================================
-- Analyst Role β Read-Only Access
-- ============================================================================
CREATE ROLE IF NOT EXISTS NORTHBRIDGE_ANALYST
COMMENT = 'Read-only access to GOLD schema tables, views, and UDFs';
GRANT ROLE NORTHBRIDGE_ANALYST TO ROLE SYSADMIN;
-- Warehouse access
GRANT USAGE ON WAREHOUSE STREAMLIT_WH TO ROLE NORTHBRIDGE_ANALYST;
-- Database and schema access
GRANT USAGE ON DATABASE NORTHBRIDGE_DATABASE TO ROLE NORTHBRIDGE_ANALYST;
GRANT USAGE ON SCHEMA NORTHBRIDGE_DATABASE.GOLD TO ROLE NORTHBRIDGE_ANALYST;
GRANT USAGE ON SCHEMA NORTHBRIDGE_DATABASE.STREAMLIT TO ROLE NORTHBRIDGE_ANALYST;
-- Current and future object grants
GRANT SELECT ON ALL TABLES IN SCHEMA NORTHBRIDGE_DATABASE.GOLD TO ROLE NORTHBRIDGE_ANALYST;
GRANT SELECT ON ALL VIEWS IN SCHEMA NORTHBRIDGE_DATABASE.GOLD TO ROLE NORTHBRIDGE_ANALYST;
GRANT USAGE ON ALL FUNCTIONS IN SCHEMA NORTHBRIDGE_DATABASE.GOLD TO ROLE NORTHBRIDGE_ANALYST;
GRANT SELECT ON FUTURE TABLES IN SCHEMA NORTHBRIDGE_DATABASE.GOLD TO ROLE NORTHBRIDGE_ANALYST;
GRANT SELECT ON FUTURE VIEWS IN SCHEMA NORTHBRIDGE_DATABASE.GOLD TO ROLE NORTHBRIDGE_ANALYST;
GRANT USAGE ON FUTURE FUNCTIONS IN SCHEMA NORTHBRIDGE_DATABASE.GOLD TO ROLE NORTHBRIDGE_ANALYST;
-- Grant to analyst users
GRANT ROLE NORTHBRIDGE_ANALYST TO USER <ANALYST_USERNAME>;After Terraform Phase 2 creates the database and schemas, run these grants as ACCOUNTADMIN to give the provisioner roles the schema-scoped privileges they need to build downstream objects:
USE ROLE ACCOUNTADMIN;
-- ----------------------------------------------------------------------------
-- DATA_OBJECT_PROVISIONER β table / file format / dynamic table creation
-- ----------------------------------------------------------------------------
GRANT USAGE ON DATABASE NORTHBRIDGE_DATABASE TO ROLE DATA_OBJECT_PROVISIONER;
GRANT USAGE ON SCHEMA NORTHBRIDGE_DATABASE.BRONZE TO ROLE DATA_OBJECT_PROVISIONER;
GRANT USAGE ON SCHEMA NORTHBRIDGE_DATABASE.SILVER TO ROLE DATA_OBJECT_PROVISIONER;
GRANT USAGE ON SCHEMA NORTHBRIDGE_DATABASE.GOLD TO ROLE DATA_OBJECT_PROVISIONER;
GRANT CREATE FILE FORMAT ON SCHEMA NORTHBRIDGE_DATABASE.BRONZE TO ROLE DATA_OBJECT_PROVISIONER;
GRANT CREATE TABLE ON SCHEMA NORTHBRIDGE_DATABASE.BRONZE TO ROLE DATA_OBJECT_PROVISIONER;
GRANT CREATE DYNAMIC TABLE ON SCHEMA NORTHBRIDGE_DATABASE.SILVER TO ROLE DATA_OBJECT_PROVISIONER;
GRANT CREATE TABLE ON SCHEMA NORTHBRIDGE_DATABASE.GOLD TO ROLE DATA_OBJECT_PROVISIONER;
GRANT CREATE VIEW ON SCHEMA NORTHBRIDGE_DATABASE.GOLD TO ROLE DATA_OBJECT_PROVISIONER;
GRANT CREATE FUNCTION ON SCHEMA NORTHBRIDGE_DATABASE.GOLD TO ROLE DATA_OBJECT_PROVISIONER;
GRANT CREATE DYNAMIC TABLE ON SCHEMA NORTHBRIDGE_DATABASE.GOLD TO ROLE DATA_OBJECT_PROVISIONER;
-- GOLD dynamic tables source from SILVER.CLEAN_NORTHBRIDGE_DT, so the role must be able to read it
GRANT SELECT ON ALL DYNAMIC TABLES IN SCHEMA NORTHBRIDGE_DATABASE.SILVER TO ROLE DATA_OBJECT_PROVISIONER;
GRANT SELECT ON FUTURE DYNAMIC TABLES IN SCHEMA NORTHBRIDGE_DATABASE.SILVER TO ROLE DATA_OBJECT_PROVISIONER;
-- ----------------------------------------------------------------------------
-- INGEST_OBJECT_PROVISIONER β stage / pipe creation (integrations already granted in 2a)
-- ----------------------------------------------------------------------------
GRANT USAGE ON DATABASE NORTHBRIDGE_DATABASE TO ROLE INGEST_OBJECT_PROVISIONER;
GRANT USAGE ON SCHEMA NORTHBRIDGE_DATABASE.BRONZE TO ROLE INGEST_OBJECT_PROVISIONER;
GRANT CREATE STAGE ON SCHEMA NORTHBRIDGE_DATABASE.BRONZE TO ROLE INGEST_OBJECT_PROVISIONER;
GRANT CREATE PIPE ON SCHEMA NORTHBRIDGE_DATABASE.BRONZE TO ROLE INGEST_OBJECT_PROVISIONER;
-- INGEST_OBJECT_PROVISIONER also needs SELECT on the target table so COPY INTO works via Snowpipe
GRANT USAGE ON FUTURE TABLES IN SCHEMA NORTHBRIDGE_DATABASE.BRONZE TO ROLE INGEST_OBJECT_PROVISIONER;
GRANT INSERT ON FUTURE TABLES IN SCHEMA NORTHBRIDGE_DATABASE.BRONZE TO ROLE INGEST_OBJECT_PROVISIONER;To verify the keypair was assigned correctly:
DESC USER GITHUB_ACTIONS_USER;
-- Look for RSA_PUBLIC_KEY_FP β should show SHA256:...To rotate the key later:
ALTER USER GITHUB_ACTIONS_USER SET RSA_PUBLIC_KEY = '<new public key body>';Variables are split between HCP Variable Sets (secrets and account-level values shared across all environments) and per-environment .tfvars files (non-sensitive values that vary per environment).
In HCP Terraform: Organization Settings β Variable sets β Create variable set
- Name:
SNOWFLAKE_CREDENTIALS - Scope: Apply to all projects and workspaces
| Variable | Category | HCL | Sensitive | How to obtain |
|---|---|---|---|---|
snowflake_private_key |
Terraform | No | β Yes | base64 < infra/platform/keypair/snowflake_key.p8 | tr -d '\n' (run from repo root; cross-platform β works on macOS and Linux) |
TF_VAR_snowflake_organization_name |
Environment | N/A | No | SELECT CURRENT_ORGANIZATION_NAME(); in Snowflake |
TF_VAR_snowflake_account_name |
Environment | N/A | No | SELECT CURRENT_ACCOUNT_NAME(); in Snowflake |
TF_VAR_snowflake_user |
Environment | N/A | No | GITHUB_ACTIONS_USER |
Important:
- For
snowflake_private_keyβ set Category = Terraform, HCL = unchecked, Sensitive = checked. The value must be the base64-encoded entire.p8file including the-----BEGIN PRIVATE KEY-----/-----END PRIVATE KEY-----markers and internal newlines. The Terraform provider decodes it withbase64decode()at runtime (seeproviders-snowflake.tf:27). Do not pre-strip the PEM headers or newlines β doing so yields raw DER bytes and JWT auth will fail.- For the three
TF_VAR_*variables β set Category = Environment. HCP injects them as OS env vars and Terraform picks them up automatically.- Do not create
SNOWFLAKE_PRIVATE_KEYas an environment variable β HCP strips newlines from env vars, breaking the PEM format.- When copying the base64 value, do not include any trailing
%shown by your shell.
Before saving the HCP variable, round-trip the value locally and confirm the first decoded line is the PEM header:
base64 < infra/platform/keypair/snowflake_key.p8 | tr -d '\n' | base64 -d | head -1
# Expect exactly: -----BEGIN PRIVATE KEY-----If the first line is anything else (garbled binary, or just a base64 blob), you either (a) stripped the PEM markers before encoding, (b) encoded the public key by mistake, or (c) the file is not PKCS#8 β regenerate per Β§1 and re-encode.
HCP Terraform silently preserves any trailing newline or CRLF pasted into an env-var field. A stray \r\n on TF_VAR_snowflake_account_name, TF_VAR_snowflake_organization_name, or TF_VAR_snowflake_user is URL-encoded into the Snowflake login URL at plan time and produces errors like:
Error: open snowflake connection: parse
"https://ORG-ACCOUNT%0D%0A.snowflakecomputing.com:443/...": invalid URL escape "%0D"
When saving these values in the HCP UI: clear the field completely, retype the value, and do not press Enter before clicking Save.
Create a second variable set named AWS_VARIABLE_SET (applied to all projects and workspaces):
| Variable | Category | Sensitive | Value |
|---|---|---|---|
AWS_ACCESS_KEY_ID |
Environment | No | Access key of the deployer IAM user |
AWS_SECRET_ACCESS_KEY |
Environment | β Yes | Secret key of the deployer IAM user |
These live in infra/platform/tf/environments/{devl,test,prod}/terraform.tfvars and are committed to version control. They vary per environment:
| Variable | Description | Example (devl) |
|---|---|---|
db_provisioner_role |
Role for database/schema ops | PLATFORM_DB_OWNER |
warehouse_provisioner_role |
Role for warehouse ops | WAREHOUSE_ADMIN |
data_object_provisioner_role |
Role for table/file format ops | DATA_OBJECT_ADMIN |
ingest_object_provisioner_role |
Role for stage/pipe ops | INGEST_ADMIN |
snowflake_warehouse |
Default warehouse for Terraform ops | UTIL_WH |
aws_config_path |
Path to AWS config JSON | config/aws/devl/config.json |
snowflake_config_path |
Path to Snowflake config JSON | config/snowflake/devl/config.json |
project_code |
Short prefix for resource naming | cust360sf |
For local development, copy the environment tfvars and export Snowflake connection variables:
cd infra/platform/tf
cp environments/devl/terraform.tfvars terraform.tfvars
# Export Snowflake connection variables (these come from the HCP Variable Set in CI)
export SNOWFLAKE_PRIVATE_KEY="$(cat ../../keypair/snowflake_key.p8)"
export TF_VAR_snowflake_organization_name="YOUR_ORG"
export TF_VAR_snowflake_account_name="YOUR_ACCOUNT"
export TF_VAR_snowflake_user="GITHUB_ACTIONS_USER"The IAM trust policy is reconciled at apply time from the live storage integration
output β there is no manual DESC INTEGRATION / config-edit step. On the very first
apply Snowpipe is gated off so it does not race the trust sync; every subsequent
apply runs in one pass.
cd infra/platform/tf
terraform init
terraform validate
terraform fmt -recursive
# Pass A β fresh bootstrap only. Creates core infra; IAM trust is auto-reconciled
# from the live storage integration output. Snowpipe is gated off so it doesn't
# race the trust sync on the first apply.
terraform apply -var-file="terraform.tfvars" -var="enable_snowpipe_creation=false"
# Pass B β every subsequent apply. Default for enable_snowpipe_creation is true,
# so no flag is needed. This enables Snowpipe + S3 event notification.
terraform apply -var-file="terraform.tfvars"Optional sanity check after Pass A:
SELECT SYSTEM$VALIDATE_STORAGE_INTEGRATION('S3_STORAGE_INTEGRATION');
aws s3 cp data/ s3://northbridge-raw-data/raw-data/json/ --recursive --include "*.json"Snowpipe (RAW_NORTHBRIDGE_PIPE) auto-ingests files into BRONZE.RAW_NORTHBRIDGE within seconds of upload via S3 event notification β SQS.
-- BRONZE: check raw ingestion
SELECT COUNT(*) FROM NORTHBRIDGE_DATABASE.BRONZE.RAW_NORTHBRIDGE;
-- Expected: 25,000
-- SILVER: check dynamic table refresh
SELECT COUNT(*) FROM NORTHBRIDGE_DATABASE.SILVER.CLEAN_NORTHBRIDGE_DT;
-- GOLD: check fact table population
SELECT COUNT(*) FROM NORTHBRIDGE_DATABASE.GOLD.FACT_TRANSACTIONS;
-- Expected: ~312,000
-- Check Snowpipe status
SELECT SYSTEM$PIPE_STATUS('NORTHBRIDGE_DATABASE.BRONZE.RAW_NORTHBRIDGE_PIPE');Upload app/northbridge_dashboard.py via the Snowflake console:
Projects β Streamlit β + Streamlit App β select STREAMLIT_WH and schema NORTHBRIDGE_DATABASE.STREAMLIT
| Role | Privilege | Used by |
|---|---|---|
DB_PROVISIONER |
CREATE DATABASE on account |
module.database_schemas |
WAREHOUSE_PROVISIONER |
CREATE WAREHOUSE, MONITOR USAGE on account |
module.warehouse |
DATA_OBJECT_PROVISIONER |
CREATE TABLE, CREATE FILE FORMAT, CREATE DYNAMIC TABLE, CREATE VIEW, CREATE FUNCTION on schemas |
module.table, module.file_formats, module.dynamic_table, module.dynamic_table_gold, module.views |
INGEST_OBJECT_PROVISIONER |
CREATE INTEGRATION on account; CREATE STAGE, CREATE PIPE on schemas |
module.storage_integrations, module.api_integrations, module.stage, module.pipe |
NORTHBRIDGE_ANALYST |
SELECT on GOLD tables/views; USAGE on GOLD functions |
Dashboard users |
| Name | Size | Purpose |
|---|---|---|
LOAD_WH |
MEDIUM | Snowpipe ingestion + COPY operations |
TRANSFORM_WH |
X-SMALL | Dynamic Table refresh, BRONZE β SILVER β GOLD |
STREAMLIT_WH |
X-SMALL | Dashboard queries |
ADHOC_WH |
X-SMALL | Development + ad-hoc debugging |
All warehouses: auto_resume = true, auto_suspend = 60s, initially_suspended = true.
| Schema | Layer | Purpose |
|---|---|---|
BRONZE |
Raw | Landing zone β VARIANT JSON, audit columns |
SILVER |
Cleansed | Typed, deduplicated, validated dynamic table |
GOLD |
Curated | Fact & dimension tables, analytical views, UDFs |
STREAMLIT |
Serving | Presentation layer for dashboard |
Five tabs backed by GOLD views, with sidebar filters for Customer Segment, Risk Rating, and Region.
| Tab | Key visuals |
|---|---|
| Executive KPIs | AUM, NPL ratio, avg credit score, AUM treemap by region |
| Customer Insights | Income & credit score distributions, segment and employment mix |
| Loan Portfolio | Loan book by type, status distribution, rate vs amount scatter |
| Transactions | Monthly volume trend, channel mix, failed transaction trend |
| Risk & Compliance | KYC status, AML exposure, credit tier vs risk heatmap |
| Metric | Value |
|---|---|
| Customers | 25,000 |
| Total nested records | ~495,000 |
| Accounts | ~75,000 |
| Loans | ~35,000 |
| Transactions | ~312,000 |
| JSON schema version | 3.0 |
| Transaction date range | 2022β2024 |
| Total file size | ~329 MB (5 Γ ~66 MB parts) |
Each customer record contains 9 nested objects: accounts[], loans[], credit_cards[], transactions[], alerts[], digital_profile{}, compliance{}, financial_summary{}, employment{}.
To regenerate the synthetic dataset (deterministic, seed 2025):
python3 scripts/gen_large.py| Term | Definition |
|---|---|
| AUM | Assets Under Management β total deposit balances |
| NPL Ratio | Non-Performing Loan ratio β at-risk loan value / total loan book |
| KYC | Know Your Customer β Verified, Pending, Flagged, Expired |
| AML | Anti-Money Laundering β risk score 0β100 |
| PEP | Politically Exposed Person |
| EMI | Equated Monthly Instalment |
| At-Risk Loan | Loan with status Defaulted or Delinquent |
| SiS | Streamlit in Snowflake |
cd infra/platform/tf
terraform destroy -var-file="terraform.tfvars"Ensure S3 buckets are empty before destroying. If the bucket has versioned objects, set
force_destroy: trueininput-jsons/aws/config.jsonfirst, then apply, then destroy.
MIT