H.E.I.M.D.A.L.L

Heroically Excessive Inference Methodology (for) Data Analytics (on) Large Loads

A telemetry-to-insight pipeline for robotics and autonomous systems. Turns fleet telemetry into natural-language insights via GPU-accelerated data loading (cuDF + UVM), NVIDIA NIM on GKE for LLM inference, and model format selection for production deployment.

Named after Heimdall, the Norse guardian from above, who sees and hears everything. Like him, this pipeline watches over your fleet telemetry from the "cloud" and turns it into insights.

Introduction

Imagine a fleet of a thousand Waymo autonomous vehicles, Tesla Optimus or Unitree G1 humanoid units, or similar, each generating streams of telemetry. You need to identify which units had anomalous brake events or abnormal sensor measurements last week, or which vehicles exceeded a speed threshold in a given region, or which robots showed elevated motor temperatures during a deployment. Manually querying and cross-referencing that data across hundreds of assets does not scale.

H.E.I.M.D.A.L.L addresses this. You load your fleet telemetry into the pipeline, then ask natural-language questions such as "Which vehicles had brake pressure above 90% in the last 24 hours?" or "List robots with gyro z-axis variance exceeding 0.5." The system returns responses with vehicle or robot IDs, timestamps, and relevant metrics. This enables rapid insights and operational visibility across large fleets of cars, autonomous vehicles, or robots; without writing complex queries.

What You Need

To run…	You need
Notebook 01 (Data Ingest)	Colab account, GPU runtime (L4/T4 recommended)
Notebook 02 (Local Inference)	Colab + GPU + Hugging Face token (for Gemma 2)
Notebook 03 (Full Pipeline)	Colab + GCP account (billing) + NGC API key + NIM on GKE

Choose Your Path

┌─────────────────────────────────────────────────────────────────┐
│  New to the project?                                            │
│  → Start with Notebook 01 (Data Ingest pandas/cuDF/cudf.pandas) │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│  Want natural-language telemetry Q&A on your machine?           │
│  → Run Notebook 02 (Local Inference) with Gemma 2 2B            │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│  Need production-scale inference in the cloud?                  │
│  → Deploy NIM on GKE, then run Notebook 03                      │
└─────────────────────────────────────────────────────────────────┘

Quick Start (Notebooks 01 & 02) (~5-10 min)

Prerequisites: Google account (for Colab), GPU runtime (T4 is default; L4 works too).

Open 01 Data Ingest (~5 min) and 02 Inference Pipeline (~10 min first run).
Runtime → Change runtime type → Hardware accelerator: T4 GPU → Save. (L4 if available.)
Notebook 02 only: Add your Hugging Face token so the notebook can load Gemma 2:
- Go to huggingface.co/google/gemma-2-2b-it and accept the license.
- Create a token at huggingface.co/settings/tokens (click New token, set role to Read).
- In Colab: click the key icon in the left sidebar → Add new secret → name: HF_TOKEN, value: paste your token → Save.
Run all cells.

Vertex AI Setup (Colab Enterprise + L4)

Run the data ingest notebook on Vertex AI with an NVIDIA L4 GPU, following the same flow as the Accelerated Data Analytics with GPUs Codelab.

Step 1: Open Colab Enterprise

Go to Google Cloud Console
Navigation menu → Vertex AI → Colab Enterprise

Step 2: Create a runtime template

Click Runtime templates → New template
Under Runtime basics:
- Display name: gpu-l4-template
- Region: Your preferred region (e.g. us-central1)
Under Configure compute:
- Machine type: g2-standard-4 (1× NVIDIA L4 GPU)
- Idle shutdown: 60 minutes (or as desired)
Click Create

Note: If you see NVIDIA_L4_GPUS exceeded, your L4 quota is used or too low. See Troubleshooting: L4 quota below. g2-standard-8 (1× L4) and g2-standard-16 (2× L4) are alternatives if you have quota.

Step 3: Start a runtime

Click Runtimes → Create
Under Runtime template, select gpu-l4-template
Click Create and wait for the runtime to boot (a few minutes)

Step 4: Import the notebook

Click My notebooks → Import

Select URL and paste:

https://github.com/KarthikSriramGit/H.E.I.M.D.A.L.L/blob/main/notebooks/01_data_ingest_benchmark.ipynb

Click Import

Step 5: Connect and run

Open the imported notebook
Click the Connect dropdown → Connect to a Runtime
Select your gpu-l4-template runtime → Connect
Run the setup cell (clone + pip install). cuDF will use the L4 GPU.
Run all cells. The benchmark will execute on the L4.

cudf.pandas (optional)

To enable zero-code-change GPU acceleration for pandas:

%load_ext cudf.pandas
import pandas as pd

Reference: Accelerated Data Analytics with Google Cloud and NVIDIA Codelab

Alternative: Regular Colab

For quick runs without GCP setup: open the notebook in Colab, then Runtime → Change runtime type → GPU (L4) when available (e.g. Colab Pro).

Notebooks

Notebook	What it does	Requirements
01 Data Ingest	cuDF + UVM loading, pandas vs cuDF vs cudf.pandas benchmark	GPU (L4)
02 Inference Pipeline	Format selection, Gemma 2 2B local inference	GPU + HF token
03 Query Telemetry	Full pipeline with NIM (Llama 3 8B on GKE)	NIM deployed (see below)

Setup for Notebook 03 (NIM on GKE)

Notebook 03 requires NIM running on GKE. Follow these steps.

Prerequisites

Google account with Google Cloud (billing enabled)
NGC account for the API key

Step 1: Create a GCP project (if needed)

Go to console.cloud.google.com
Create a project or select an existing one
Enable billing for the project
Note your Project ID (e.g. heimdall-487621)

Step 2: Enable APIs

Go to APIs & Services → Library
Enable Kubernetes Engine API and Compute Engine API

Step 3: Get your NGC API key

Go to ngc.nvidia.com and sign in
Profile (top right) → Setup → Generate API Key
Copy the key (starts with nvapi-)

Step 4: Deploy NIM on GKE

Open Google Cloud Shell and run:

# Set your values
export PROJECT_ID="your-gcp-project-id"
export ZONE="us-central1-a"
export NGC_CLI_API_KEY="your-ngc-api-key"

# Configure gcloud
gcloud config set project $PROJECT_ID

# Create cluster and GPU node pool (skip if already exists)
gcloud container clusters create nim-demo \
  --project=$PROJECT_ID \
  --location=$ZONE \
  --release-channel=rapid \
  --machine-type=e2-standard-4 \
  --num-nodes=1

gcloud container node-pools create gpupool \
  --accelerator type=nvidia-l4,count=1,gpu-driver-version=latest \
  --project=$PROJECT_ID \
  --location=$ZONE \
  --cluster=nim-demo \
  --machine-type=g2-standard-16 \
  --num-nodes=1

# Connect to cluster
gcloud container clusters get-credentials nim-demo --zone=$ZONE --project=$PROJECT_ID

# Clone repo and deploy NIM
git clone -q https://github.com/KarthikSriramGit/H.E.I.M.D.A.L.L.git
cd H.E.I.M.D.A.L.L
bash src/deploy/gke/deploy_nim.sh

# Expose NIM with LoadBalancer
kubectl patch svc my-nim-nim-llm -n nim -p '{"spec": {"type": "LoadBalancer"}}'
kubectl get svc -n nim

If cluster already exists, run only:

export PROJECT_ID="your-gcp-project-id"
export ZONE="us-central1-a"
export NGC_CLI_API_KEY="your-ngc-api-key"
gcloud config set project $PROJECT_ID
gcloud container clusters get-credentials nim-demo --zone=$ZONE --project=$PROJECT_ID
git clone -q https://github.com/KarthikSriramGit/H.E.I.M.D.A.L.L.git
cd H.E.I.M.D.A.L.L
bash src/deploy/gke/deploy_nim.sh
kubectl patch svc my-nim-nim-llm -n nim -p '{"spec": {"type": "LoadBalancer"}}'
kubectl get svc -n nim

Step 5: Wait and get your NIM URL

Wait for EXTERNAL-IP to appear (not <pending>):
```
kubectl get svc -n nim
```
Wait for the pod to be Running:
```
kubectl get pods -n nim
```
Note your NIM URL: http://EXTERNAL_IP:8000 (replace EXTERNAL_IP with the IP from kubectl get svc -n nim)

Do not commit this URL. Keep it private.

Step 6: Run notebook 03 on Colab

Open 03 Query Telemetry in Colab
Store your NIM URL securely using Colab Secrets:
- Click the key icon (Secrets) in the left sidebar
- Add: Name = NIM_BASE_URL, Value = http://YOUR_EXTERNAL_IP:8000
- Toggle Notebook access to ON
Run all cells

Step 7: Cleanup (when finished)

To avoid ongoing GCP charges:

kubectl delete namespace nim
gcloud container clusters delete nim-demo --zone=$ZONE --project=$PROJECT_ID --quiet

Architecture

  Data Layer              Inference Layer                  Deployment Layer
  ----------------        ----------------                 ----------------
  Synthetic Generator --> cuDF + UVM Loader ----------->   Format Selector
                          Benchmark                        Inference Pipeline
                          (pandas, cuDF, cudf.pandas)      Metrics (p50, p90, TTFT)
                        Query Engine  <-- NIM Client  <--  NIM on GKE

Results & Takeaways

Benchmark results from running the notebooks on Colab: L4 GPU for notebook 01, T4 GPU for notebook 02, and NIM (L4 on GKE) for notebook 03.

1. Data ingest: pandas vs cuDF vs cudf.pandas (2M rows)

The benchmark produces a single figure with 3 plots (time comparison, memory comparison, cuDF speedup) and a summary table.

Run notebook 01 to regenerate this figure; it is saved to docs/assets/01_benchmark_pandas_vs_cudf.png.

Takeaway: cuDF gives ~5× faster load and ~10–13× faster groupby/sort, with negligible host memory (data stays in GPU VRAM via UVM spill). cudf.pandas achieves similar GPU performance using the same pandas API—zero code change required. Use cuDF for explicit control, cudf.pandas to accelerate existing pandas code.

2. Local inference: Gemma 2 2B (notebook 02)

Natural-language answers to telemetry queries on a T4 GPU. Typical run: ~21s total for 5 queries (~4.2s avg latency).

Takeaway: Local GGUF inference works for low-latency, offline telemetry Q&A. Good for prototyping; heavier workloads may need cloud scaling.

3. Full pipeline: NIM on GKE (notebook 03)

Same queries via Llama 3 8B on NVIDIA NIM (L4 on GKE). Longer answers, higher latency due to model size and network.

Takeaway: NIM on GKE scales inference for production. Use notebook 02 for fast local iteration and notebook 03 for production-style deployment.

Troubleshooting

Notebook 01: Data Ingest

Issue	Solution
L4 quota exceeded	See L4 quota exceeded below.
cuDF fails to import	Ensure you selected a GPU runtime (L4 or T4). cuDF requires a GPU.
"No GPU" or cuDF falls back to CPU	Runtime → Change runtime type → GPU (L4 or T4) → Save, then restart runtime.
Out of memory (past UVM spill)	Reduce row count in the generate cell (e.g. `ROWS = 500_000` instead of `2_000_000`).

L4 quota exceeded

If you see Quota 'NVIDIA_L4_GPUS' exceeded. Limit: 1.0 in region us-central1:

Free up quota: Stop or delete any existing Colab Enterprise runtimes or other L4 instances in that region.
Request more L4 quota (e.g. Iowa / us-central1):
- Go to IAM & Admin → Quotas
- In the filter box, enter NVIDIA_L4_GPUS
- Select your project and Location: us-central1 (Iowa)
- Check the box next to NVIDIA L4 GPUs
- Click Edit quotas (top of page)
- Enter the new limit (e.g. 2 if you want one more)
- Add a short justification (e.g. "Need additional L4 for Colab Enterprise data analytics workloads")
- Click Submit. Google typically reviews instantly.
Try another region: Create the runtime template in a different region (e.g. us-east1, europe-west1) where you may have quota. See L4 availability.
Use T4 instead: Create a template with machine type n1-standard-4 and Attach GPU: NVIDIA Tesla T4 (1). T4 works with cuDF; the notebook runs unchanged. Then import and connect as usual.

Notebook 02: Inference Pipeline

Issue	Solution
"HF_TOKEN not set" or 401	Add `HF_TOKEN` in Colab Secrets. Accept the Gemma 2 license first.
Out of memory (OOM)	Use a GPU runtime (T4 or better). Gemma 2 2B needs ~4–6 GB VRAM.
Model download is slow	First run downloads ~5 GB. Use a stable connection; subsequent runs use cache.

Notebook 03: NIM Connection

Issue	Solution
ConnectionRefusedError	NIM may not be ready or not reachable. See steps below.
EXTERNAL-IP stuck on <pending>	Wait 2–5 min. If it stays pending, check GCP quotas for forwarding rules.
Pod not Running	First deployment can take 10–20 min for model download. Run `kubectl get pods -n nim -w` to watch.

If you get ConnectionRefusedError when calling NIM from Colab:

Check the NIM pod is Running
```
kubectl get pods -n nim
```
Wait until STATUS is Running and READY is 1/1.

Test from Cloud Shell (port-forward)

kubectl port-forward svc/my-nim-nim-llm 8000:8000 -n nim &
sleep 5
curl -s http://localhost:8000/v1/models
kill %1

If this works, NIM is fine; the problem is external access.

Check service endpoints
```
kubectl get endpoints -n nim
```
my-nim-nim-llm should have at least one address. If it shows <none>, the pod is not ready.
Verify LoadBalancer backends
In GCP Console → Network Services → Load balancing, open the load balancer for the NIM service and confirm the backends are healthy.

Allow firewall (if needed)

gcloud compute firewall-rules create allow-nim-8000 \
  --allow tcp:8000 \
  --source-ranges 0.0.0.0/0 \
  --target-tags $(kubectl get nodes -o jsonpath='{.items[0].metadata.labels.cloud\.google\.com\/gke-nodepool}' 2>/dev/null || echo "gke-nim-demo")

General

Issue	Solution
Colab disconnects or session dies	Notebooks auto-clone the repo; re-run the setup cell. For long runs, consider Colab Pro for longer sessions.
Image or asset not loading	Ensure you cloned the full repo and that `docs/` is present.

Real Data

See data/README_data_sources.md for a plan to gather real telemetry from nuScenes, KITTI, CARLA, ROS2 bags, and OBD-II CAN data.

Inspiration

Contributing

Contributions are welcome. To keep the codebase safe, direct pushes to main are not allowed. All changes must go through pull requests (PRs).

Safe contribution workflow

Fork the repository to your GitHub account.
Clone your fork locally.
Create a branch for your change: git checkout -b feature/your-feature-name
Make your changes, commit, and push to your fork.
Open a pull request from your fork’s branch to this repo’s main.
Wait for review; maintainers will review and merge after approval.

This workflow ensures every change is reviewed before it reaches main. Repository branch protection rules enforce this.

Idea areas: adapters for nuScenes/KITTI, ROS2 bag-to-Parquet scripts, NIM prompt templates, benchmark results on different GPU configurations.

Code of Conduct

This project adheres to a code of conduct that all contributors are expected to follow.

Be respectful and inclusive. Welcome newcomers and diverse perspectives.
Focus on constructive feedback. Critique ideas, not people.
No harassment, trolling, or discriminatory behavior.
Help keep the community safe and productive for everyone.

Violations can be reported to the maintainers. We reserve the right to remove contributions or block users who do not follow these guidelines.

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
.github/workflows		.github/workflows
config		config
data		data
docs/assets		docs/assets
notebooks		notebooks
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

H.E.I.M.D.A.L.L

Introduction

Table of Contents

What You Need

Choose Your Path

Quick Start (Notebooks 01 & 02) (~5-10 min)

Vertex AI Setup (Colab Enterprise + L4)

Step 1: Open Colab Enterprise

Step 2: Create a runtime template

Step 3: Start a runtime

Step 4: Import the notebook

Step 5: Connect and run

cudf.pandas (optional)

Alternative: Regular Colab

Notebooks

Setup for Notebook 03 (NIM on GKE)

Prerequisites

Step 1: Create a GCP project (if needed)

Step 2: Enable APIs

Step 3: Get your NGC API key

Step 4: Deploy NIM on GKE

Step 5: Wait and get your NIM URL

Step 6: Run notebook 03 on Colab

Step 7: Cleanup (when finished)

Architecture

Results & Takeaways

1. Data ingest: pandas vs cuDF vs cudf.pandas (2M rows)

2. Local inference: Gemma 2 2B (notebook 02)

3. Full pipeline: NIM on GKE (notebook 03)

Troubleshooting

Notebook 01: Data Ingest

L4 quota exceeded

Notebook 02: Inference Pipeline

Notebook 03: NIM Connection

General

Real Data

Inspiration

Contributing

Safe contribution workflow

Code of Conduct

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages