diff --git a/feast/README.md b/feast/README.md new file mode 100644 index 0000000..8fccd41 --- /dev/null +++ b/feast/README.md @@ -0,0 +1,71 @@ +# Feast Feature Store Example + +A complete example of using [Feast](https://docs.feast.dev/) on prokube for +feature management in ML workflows. + +**Scenario:** An online retailer wants to predict whether a customer will +return their next order. The notebook walks through defining customer features, +training a return-risk model, and serving predictions in real time. + +## Quick Start + +1. Feast must be enabled on your cluster (ask your admin) +2. Clone this repository to your notebook server +3. Open `feast_example.ipynb` from the `feast/` directory and run all cells + +The notebook's **Infrastructure setup** cell handles everything automatically: +Redis, secrets, FeatureStore CR, and (for remote mode) NetworkPolicies. + +## Registry modes + +There are two registry modes. Select one in the notebook when prompted: + +| | Local | Remote | +|---|---|---| +| **Registry** | SQLite SQL on `/tmp` (ephemeral) | gRPC server on operator PVC (persistent, shared) | +| **Good for** | Single user, quick iteration | Teams sharing definitions across clients | + +## Files + +``` +feast/ + feast_example.ipynb End-to-end notebook (works with both modes) + redis-cr.yaml Redis instance CR (OpsTree operator) + registry/ + local/ + feast-cr.yaml FeatureStore CR — local SQLite SQL registry + feature_store.yaml Feast SDK config template + README.md Local mode details and trade-offs + remote/ + feast-cr.yaml FeatureStore CR — remote gRPC registry server + feature_store.yaml Feast SDK config template + network-policies.yaml CNI-layer NetworkPolicies for isolation + README.md Remote mode details and trade-offs +``` + +## Architecture + +Feast has three stores: + +| Store | Purpose | Backend | +|-------|---------|---------| +| **Registry** | Feature definitions (entities, feature views, sources). Written on `feast apply`. | Local: SQLite SQL file. Remote: gRPC server on operator PVC. | +| **Online store** | Latest feature value per entity. Read on every inference — latency critical. | Redis (your `Redis` CR) | +| **Offline store** | Historical feature records for point-in-time joins during training. | Parquet on PVC | + +``` + ┌──────────────────────────────────────┐ + │ Your Namespace │ + │ │ + │ Redis CR (redis-feast) │ + │ │ + store.apply() ───▶ Registry │ + (notebook) │ local: sqlite:////tmp/registry.db │ + │ remote: gRPC → operator PVC │ + │ │ + materialize ──────▶ Redis online store │ + │ │ + historical ──────▶ Parquet on PVC (offline store) │ + features │ │ + └──────────────────────────────────────┘ +``` diff --git a/feast/feast_example.ipynb b/feast/feast_example.ipynb new file mode 100644 index 0000000..dbac853 --- /dev/null +++ b/feast/feast_example.ipynb @@ -0,0 +1,970 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "1eaa631f", + "metadata": {}, + "source": [ + "# Feast Feature Store on prokube\n", + "\n", + "**Scenario:** You work at an online retailer. Your team wants to predict\n", + "whether a customer will return their next order. To do that, you need\n", + "customer-level features (order history, return rates, spending patterns)\n", + "available for both model training and real-time inference.\n", + "\n", + "This notebook walks through the full Feast workflow:\n", + "\n", + "1. **Setup** — install dependencies, configure the Feast client\n", + "2. **Generate data** — simulate customer order history\n", + "3. **Define features** — entities, feature views, and on-demand transformations\n", + "4. **Register** — push definitions to the Feast registry\n", + "5. **Train** — retrieve historical features, preprocess, train a return predictor\n", + "6. **Materialize** — push latest values to Redis for online serving\n", + "7. **Serve** — predict return risk for incoming orders in real time\n", + "\n", + "Everything happens inline in this notebook — no terminal needed.\n", + "\n", + "### Prerequisites\n", + "\n", + "- Feast must be enabled on your cluster (ask your admin)\n", + "- Open this notebook from the `feast/` directory of the repository\n", + "\n", + "Run all cells in order. The **Infrastructure setup** cell deploys Redis,\n", + "creates secrets, and deploys the FeatureStore CR automatically.\n" + ] + }, + { + "cell_type": "markdown", + "id": "setup_header", + "metadata": {}, + "source": [ + "---\n", + "## 1. Setup\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4b8f4c32", + "metadata": {}, + "outputs": [], + "source": [ + "!pip install -q 'feast[redis]' grpcio scikit-learn ipywidgets\n" + ] + }, + { + "cell_type": "markdown", + "id": "ef7e1942", + "metadata": {}, + "source": [ + "### Choose registry mode\n", + "\n", + "Select how this notebook connects to the Feast registry.\n", + "See `registry/local/README.md` and `registry/remote/README.md` for the trade-offs.\n", + "\n", + "Run this cell, make your selection, then run the next cell.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "registry_choice", + "metadata": {}, + "outputs": [], + "source": [ + "import ipywidgets as widgets\n", + "from IPython.display import display\n", + "\n", + "registry_widget = widgets.RadioButtons(\n", + " options=[\n", + " (\n", + " \"Local — SQLite SQL on /tmp \"\n", + " \"(simpler, ODFVs work, ephemeral; use registry/local/feast-cr.yaml)\",\n", + " \"local\",\n", + " ),\n", + " (\n", + " \"Remote — operator gRPC server \"\n", + " \"(persistent, shared across clients; use registry/remote/feast-cr.yaml)\",\n", + " \"remote\",\n", + " ),\n", + " ],\n", + " description=\"Registry:\",\n", + " style={\"description_width\": \"initial\"},\n", + " layout=widgets.Layout(width=\"max-content\"),\n", + ")\n", + "display(registry_widget)\n" + ] + }, + { + "cell_type": "markdown", + "id": "infra_setup_header", + "metadata": {}, + "source": [ + "### Infrastructure setup\n", + "\n", + "The cell below deploys everything Feast needs and is safe to re-run — each\n", + "step checks whether the resource already exists and skips it if so.\n", + "\n", + "| Step | What it creates |\n", + "|------|-----------------|\n", + "| 1 | `redis-feast` secret — a random Redis password |\n", + "| 2 | Redis instance (`redis-cr.yaml`) — waits until the pod is Ready |\n", + "| 3 | `feast-redis-config` secret — connection string for the Feast operator |\n", + "| 4 | FeatureStore CR (`registry//feast-cr.yaml`) — waits until Ready |\n", + "| 5 | NetworkPolicies (remote mode only) — restricts registry and Redis to this namespace |\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "infra_setup", + "metadata": {}, + "outputs": [], + "source": [ + "import base64\n", + "import json\n", + "import os\n", + "import secrets as _secrets\n", + "import subprocess\n", + "import tempfile\n", + "import time\n", + "\n", + "import yaml\n", + "\n", + "\n", + "# ---------------------------------------------------------------------------\n", + "# Utility helpers used throughout the notebook.\n", + "# ---------------------------------------------------------------------------\n", + "\n", + "def kubectl_json(*args):\n", + " return json.loads(subprocess.check_output([\"kubectl\", *args, \"-o\", \"json\"]))\n", + "\n", + "\n", + "def get_namespace():\n", + " \"\"\"Read the current namespace from the pod's service account.\"\"\"\n", + " try:\n", + " with open(\"/var/run/secrets/kubernetes.io/serviceaccount/namespace\") as f:\n", + " return f.read().strip()\n", + " except FileNotFoundError:\n", + " return subprocess.check_output(\n", + " [\"kubectl\", \"config\", \"view\", \"--minify\", \"-o\", \"jsonpath={..namespace}\"]\n", + " ).decode().strip()\n", + "\n", + "\n", + "def secret_exists(name):\n", + " return subprocess.run([\"kubectl\", \"get\", \"secret\", name], capture_output=True).returncode == 0\n", + "\n", + "\n", + "# Verify we are in the right directory — relative apply paths depend on this.\n", + "if not os.path.exists(\"redis-cr.yaml\"):\n", + " raise RuntimeError(\n", + " \"redis-cr.yaml not found. Open this notebook from the feast/ directory.\"\n", + " )\n", + "\n", + "NAMESPACE = get_namespace()\n", + "REGISTRY_MODE = registry_widget.value\n", + "print(f\"Namespace: {NAMESPACE}\")\n", + "print(f\"Registry mode: {REGISTRY_MODE}\")\n", + "print()\n", + "\n", + "# ---------------------------------------------------------------------------\n", + "# 1. Redis password secret\n", + "# ---------------------------------------------------------------------------\n", + "if not secret_exists(\"redis-feast\"):\n", + " password = _secrets.token_urlsafe(18)\n", + " subprocess.check_call([\n", + " \"kubectl\", \"create\", \"secret\", \"generic\", \"redis-feast\",\n", + " f\"--from-literal=password={password}\",\n", + " ])\n", + " print(\"Created redis-feast secret\")\n", + "else:\n", + " _pw = kubectl_json(\"get\", \"secret\", \"redis-feast\")\n", + " password = base64.b64decode(_pw[\"data\"][\"password\"]).decode()\n", + " print(\"redis-feast: already exists\")\n", + "\n", + "# ---------------------------------------------------------------------------\n", + "# 2. Redis instance\n", + "# ---------------------------------------------------------------------------\n", + "subprocess.check_call([\"kubectl\", \"apply\", \"-f\", \"redis-cr.yaml\"])\n", + "print(\"Waiting for Redis pod to be Ready\", end=\"\", flush=True)\n", + "for _ in range(60):\n", + " _r = subprocess.run(\n", + " [\"kubectl\", \"get\", \"pods\", \"-l\", \"app=redis-feast\",\n", + " \"-o\", \"jsonpath={.items[0].status.conditions[?(@.type=='Ready')].status}\"],\n", + " capture_output=True, text=True,\n", + " )\n", + " if _r.stdout.strip() == \"True\":\n", + " break\n", + " time.sleep(5)\n", + " print(\".\", end=\"\", flush=True)\n", + "else:\n", + " raise RuntimeError(\"Redis pod did not become Ready within 5 minutes\")\n", + "print(\" done\")\n", + "\n", + "# ---------------------------------------------------------------------------\n", + "# 3. Feast Redis connection secret\n", + "#\n", + "# Key 'redis' holds a YAML snippet — the Feast operator reads it as:\n", + "# yaml.safe_load(secret['redis'])['connection_string']\n", + "# ---------------------------------------------------------------------------\n", + "if not secret_exists(\"feast-redis-config\"):\n", + " _conn = f\"redis-feast.{NAMESPACE}.svc.cluster.local:6379,password={password}\"\n", + " with tempfile.NamedTemporaryFile(mode=\"w\", suffix=\".yaml\", delete=False) as _f:\n", + " _f.write(f\"connection_string: '{_conn}'\\n\")\n", + " _tmp = _f.name\n", + " subprocess.check_call([\n", + " \"kubectl\", \"create\", \"secret\", \"generic\", \"feast-redis-config\",\n", + " f\"--from-file=redis={_tmp}\",\n", + " ])\n", + " os.unlink(_tmp)\n", + " print(\"Created feast-redis-config secret\")\n", + "else:\n", + " print(\"feast-redis-config: already exists\")\n", + "\n", + "# ---------------------------------------------------------------------------\n", + "# 4. FeatureStore CR\n", + "# ---------------------------------------------------------------------------\n", + "_cr = f\"registry/{REGISTRY_MODE}/feast-cr.yaml\"\n", + "subprocess.check_call([\"kubectl\", \"apply\", \"-f\", _cr])\n", + "print(\"Waiting for FeatureStore to be Ready\", end=\"\", flush=True)\n", + "for _ in range(60):\n", + " _r = subprocess.run(\n", + " [\"kubectl\", \"get\", \"featurestore\", \"my-store\",\n", + " \"-o\", \"jsonpath={.status.clientConfigMap}\"],\n", + " capture_output=True, text=True,\n", + " )\n", + " if _r.stdout.strip():\n", + " break\n", + " time.sleep(5)\n", + " print(\".\", end=\"\", flush=True)\n", + "else:\n", + " raise RuntimeError(\"FeatureStore did not become Ready within 5 minutes\")\n", + "print(\" done\")\n", + "\n", + "# ---------------------------------------------------------------------------\n", + "# 5. Network policies (remote mode only)\n", + "#\n", + "# The YAML contains a placeholder — we fill it in at apply time\n", + "# so the namespaceSelector targets this specific namespace. This is required\n", + "# for Calico eBPF mode, which does not reliably enforce same-namespace\n", + "# restriction from podSelector: {} alone.\n", + "# ---------------------------------------------------------------------------\n", + "if REGISTRY_MODE == \"remote\":\n", + " with open(\"registry/remote/network-policies.yaml\") as _f:\n", + " _np_yaml = _f.read().replace(\"\", NAMESPACE)\n", + " _proc = subprocess.run(\n", + " [\"kubectl\", \"apply\", \"-f\", \"-\"],\n", + " input=_np_yaml.encode(), capture_output=True,\n", + " )\n", + " if _proc.returncode != 0:\n", + " raise RuntimeError(_proc.stderr.decode())\n", + " print(\"Applied network policies\")\n", + "\n", + "print(\"\\nInfrastructure ready.\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "configure_client_header", + "metadata": {}, + "source": [ + "### Configure the Feast client\n", + "\n", + "Reads the FeatureStore CR and the Redis secret, then writes `feature_store.yaml`\n", + "— the SDK config that tells Feast where the registry, online store, and offline\n", + "store live. Re-run this cell if you restart the kernel without re-running setup.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a8804025", + "metadata": {}, + "outputs": [], + "source": [ + "# Find the FeatureStore CR deployed by the setup cell.\n", + "fs_list = kubectl_json(\"get\", \"featurestore\")[\"items\"]\n", + "if not fs_list:\n", + " raise RuntimeError(\n", + " \"No FeatureStore CR found — run the infrastructure setup cell first.\"\n", + " )\n", + "fs = fs_list[0]\n", + "fs_name = fs[\"metadata\"][\"name\"]\n", + "redis_secret_name = (\n", + " fs[\"spec\"][\"services\"][\"onlineStore\"][\"persistence\"][\"store\"][\"secretRef\"][\"name\"]\n", + ")\n", + "redis_secret_key = fs[\"spec\"][\"services\"][\"onlineStore\"][\"persistence\"][\"store\"].get(\n", + " \"secretKeyName\", \"redis\"\n", + ")\n", + "redis_secret = kubectl_json(\"get\", \"secret\", redis_secret_name)\n", + "redis_yaml = base64.b64decode(redis_secret[\"data\"][redis_secret_key]).decode()\n", + "redis_conn = yaml.safe_load(redis_yaml)[\"connection_string\"]\n", + "\n", + "if REGISTRY_MODE == \"remote\":\n", + " # Read the operator-published ConfigMap — it already has project name\n", + " # and registry_type: remote pointing at the gRPC server.\n", + " client_cm_name = fs[\"status\"][\"clientConfigMap\"]\n", + " client_cm = kubectl_json(\"get\", \"cm\", client_cm_name)\n", + " config = yaml.safe_load(client_cm[\"data\"][\"feature_store.yaml\"])\n", + " # Override online/offline store so materialize() works from this notebook.\n", + " config[\"online_store\"] = {\"type\": \"redis\", \"connection_string\": redis_conn}\n", + " config[\"offline_store\"] = {\"type\": \"file\"}\n", + " registry_info = config[\"registry\"][\"path\"]\n", + "else: # local\n", + " # Build feature_store.yaml with a local SQLite SQL registry.\n", + " # SQLite SQL has proper transactional semantics vs the plain file registry\n", + " # (matters when materializing multiple feature views concurrently).\n", + " # The Feast operator CRD does not expose registry_type: sql, so this is\n", + " # configured in the SDK config rather than the FeatureStore CR.\n", + " config = {\n", + " \"project\": \"retail_features\",\n", + " \"provider\": \"local\",\n", + " \"offline_store\": {\"type\": \"file\"},\n", + " \"online_store\": {\"type\": \"redis\", \"connection_string\": redis_conn},\n", + " \"registry\": {\n", + " \"registry_type\": \"sql\",\n", + " \"path\": \"sqlite:////tmp/registry.db\",\n", + " \"cache_ttl_seconds\": 60,\n", + " },\n", + " \"auth\": {\"type\": \"no_auth\"},\n", + " \"entity_key_serialization_version\": 3,\n", + " }\n", + " registry_info = \"sqlite:////tmp/registry.db (ephemeral — re-run apply() after pod restart)\"\n", + "\n", + "with open(\"feature_store.yaml\", \"w\") as f:\n", + " yaml.safe_dump(config, f, sort_keys=False)\n", + "\n", + "FEAST_PROJECT = config[\"project\"]\n", + "\n", + "print(f\"Registry mode: {REGISTRY_MODE}\")\n", + "print(f\"FeatureStore CR: {fs_name}\")\n", + "print(f\"Project: {FEAST_PROJECT}\")\n", + "print(f\"Namespace: {NAMESPACE}\")\n", + "print(f\"Registry: {registry_info}\")\n", + "print(f\"Online store: redis @ {redis_conn.split(',')[0]}\")\n", + "print(\"\\nfeature_store.yaml written.\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "3956d1d1", + "metadata": {}, + "source": [ + "---\n", + "## 2. Generate sample data\n", + "\n", + "We simulate a customer order history table — the kind of data your data\n", + "pipeline would produce daily. Each row represents the aggregated stats for\n", + "one customer at one point in time:\n", + "\n", + "| Column | Meaning |\n", + "|--------|--------|\n", + "| `customer_id` | Unique customer identifier |\n", + "| `total_orders` | Total number of orders placed |\n", + "| `total_returns` | Total number of returned orders |\n", + "| `avg_order_value` | Average order value in EUR |\n", + "| `days_since_last_order` | Days since the customer's last order |\n", + "| `returned` | Did the customer return their most recent order? (label) |\n", + "\n", + "In a real project, this would come from your data warehouse or ETL pipeline.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f2b288da", + "metadata": {}, + "outputs": [], + "source": [ + "import datetime\n", + "import os\n", + "\n", + "import numpy as np\n", + "import pandas as pd\n", + "\n", + "np.random.seed(42)\n", + "n_customers = 200\n", + "n_snapshots = 10 # 10 daily snapshots per customer\n", + "n = n_customers * n_snapshots\n", + "now = datetime.datetime.now()\n", + "\n", + "customer_ids = np.repeat(np.arange(1, n_customers + 1), n_snapshots)\n", + "timestamps = []\n", + "for _ in range(n_customers):\n", + " timestamps.extend([now - datetime.timedelta(days=i) for i in range(n_snapshots)])\n", + "\n", + "# Simulate realistic customer stats\n", + "total_orders = np.random.randint(1, 80, n).astype(np.int64)\n", + "total_returns = np.array([\n", + " np.random.binomial(orders, np.random.uniform(0.05, 0.4))\n", + " for orders in total_orders\n", + "]).astype(np.int64)\n", + "avg_order_value = np.random.uniform(15.0, 250.0, n).astype(np.float32)\n", + "days_since_last_order = np.random.randint(0, 90, n).astype(np.int64)\n", + "\n", + "# Label: customers with high return rates and high order values are more\n", + "# likely to return. Add noise to keep it realistic.\n", + "return_rate = total_returns / np.maximum(total_orders, 1)\n", + "return_prob = 0.3 * return_rate + 0.002 * avg_order_value / 250.0 + np.random.normal(0, 0.1, n)\n", + "returned = (return_prob > 0.15).astype(np.int64)\n", + "\n", + "customer_df = pd.DataFrame({\n", + " \"customer_id\": customer_ids,\n", + " \"event_timestamp\": timestamps,\n", + " \"total_orders\": total_orders,\n", + " \"total_returns\": total_returns,\n", + " \"avg_order_value\": avg_order_value,\n", + " \"days_since_last_order\": days_since_last_order,\n", + " \"returned\": returned,\n", + " \"created\": timestamps,\n", + "})\n", + "\n", + "os.makedirs(\"data\", exist_ok=True)\n", + "customer_df.to_parquet(\"data/customer_orders.parquet\")\n", + "print(f\"Created {n} rows for {n_customers} customers ({n_snapshots} snapshots each)\")\n", + "print(f\"Return rate in dataset: {returned.mean():.1%}\")\n", + "customer_df.head(10)\n" + ] + }, + { + "cell_type": "markdown", + "id": "358c2624", + "metadata": {}, + "source": [ + "---\n", + "## 3. Define features\n", + "\n", + "In Feast, features are defined as Python objects:\n", + "\n", + "- **Entity**: the primary key for lookups (here, `customer_id`).\n", + "- **DataSource**: where the raw feature data lives (parquet, table, etc.).\n", + "- **FeatureView**: declares which columns from the source are features and\n", + " how long they're valid (`ttl`). Feast stores and serves them — but doesn't\n", + " compute anything. Your data pipeline is responsible for producing the data.\n", + "- **FeatureService**: a named bundle of one or more feature views. Consumers\n", + " (e.g. an inference API) reference the service by name instead of listing\n", + " individual features — they don't need to know the internal view structure.\n", + "\n", + "Derived features (`return_rate = total_returns / total_orders`, etc.) are\n", + "plain pandas computations in this notebook. Feast 0.63 also offers\n", + "*on-demand feature views* for serving-time transformations, but they don't\n", + "yet work reliably with the Operator-managed remote registry, so we skip\n", + "them here.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "define_features", + "metadata": {}, + "outputs": [], + "source": [ + "from datetime import timedelta\n", + "\n", + "import pandas as pd\n", + "\n", + "from feast import Entity, FeatureService, FeatureStore, FeatureView, Field, FileSource, ValueType\n", + "from feast.on_demand_feature_view import on_demand_feature_view\n", + "from feast.types import Float32, Int64\n", + "\n", + "# ---------------------------------------------------------------------------\n", + "# Entity: the \"primary key\" for feature lookups.\n", + "# When you request features, you provide a customer_id.\n", + "# ---------------------------------------------------------------------------\n", + "customer = Entity(\n", + " name=\"customer_id\",\n", + " value_type=ValueType.INT64,\n", + " description=\"Unique customer identifier\",\n", + ")\n", + "\n", + "# ---------------------------------------------------------------------------\n", + "# Data source: points to the parquet file produced by the data pipeline.\n", + "# ---------------------------------------------------------------------------\n", + "customer_orders_source = FileSource(\n", + " path=\"data/customer_orders.parquet\",\n", + " timestamp_field=\"event_timestamp\",\n", + " created_timestamp_column=\"created\",\n", + ")\n", + "\n", + "# ---------------------------------------------------------------------------\n", + "# FeatureView: declares which columns from the source are features.\n", + "# These are raw/precomputed values from your data pipeline.\n", + "#\n", + "# Note: the source data also contains \"returned\" (did the customer return\n", + "# their last order?). We don't include it here because it's our prediction\n", + "# target — at serving time, we don't know the answer yet. Labels live\n", + "# outside Feast and are joined only during training.\n", + "# ---------------------------------------------------------------------------\n", + "customer_order_stats = FeatureView(\n", + " name=\"customer_order_stats\",\n", + " entities=[customer],\n", + " ttl=timedelta(days=30),\n", + " schema=[\n", + " Field(name=\"total_orders\", dtype=Int64),\n", + " Field(name=\"total_returns\", dtype=Int64),\n", + " Field(name=\"avg_order_value\", dtype=Float32),\n", + " Field(name=\"days_since_last_order\", dtype=Int64),\n", + " ],\n", + " source=customer_orders_source,\n", + " online=True,\n", + ")\n", + "\n", + "# ---------------------------------------------------------------------------\n", + "# On-Demand Feature View: computes derived features at request time.\n", + "#\n", + "# Feast calls this function automatically during get_historical_features()\n", + "# and get_online_features() — no precomputation or manual pandas needed.\n", + "#\n", + "# Why on-demand rather than writing to the online store?\n", + "# return_rate and return_risk are simple ratios. Computing them on-the-fly\n", + "# is cheap, keeps Redis smaller, and avoids a known Feast ≤0.63 bug where\n", + "# write_to_online_store=True fails with an entity-key serialization error\n", + "# on certain Redis backends.\n", + "# ---------------------------------------------------------------------------\n", + "@on_demand_feature_view(\n", + " sources=[customer_order_stats],\n", + " schema=[\n", + " Field(name=\"return_rate\", dtype=Float32),\n", + " Field(name=\"return_risk\", dtype=Float32),\n", + " ],\n", + ")\n", + "def customer_risk_features(inputs: pd.DataFrame) -> pd.DataFrame:\n", + " df = pd.DataFrame()\n", + " df[\"return_rate\"] = (\n", + " inputs[\"total_returns\"] / inputs[\"total_orders\"].clip(lower=1)\n", + " ).astype(\"float32\")\n", + " df[\"return_risk\"] = (df[\"return_rate\"] * inputs[\"avg_order_value\"]).astype(\"float32\")\n", + " return df\n", + "\n", + "\n", + "# ---------------------------------------------------------------------------\n", + "# FeatureService: a named bundle of feature views.\n", + "#\n", + "# Instead of listing individual feature names in every get_online_features()\n", + "# call, consumers reference the service by name. This is especially useful\n", + "# when external services (e.g. an inference API) need features — they only\n", + "# need to know the service name, not the internal view structure.\n", + "# ---------------------------------------------------------------------------\n", + "customer_risk_service = FeatureService(\n", + " name=\"customer_risk_service\",\n", + " features=[customer_order_stats, customer_risk_features],\n", + ")\n", + "\n", + "print(\"Feature definitions created (not yet registered).\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "register_header", + "metadata": {}, + "source": [ + "---\n", + "## 4. Register features\n", + "\n", + "`store.apply()` writes all definitions to the registry. After this call,\n", + "the feature views and service are visible to every other client in the\n", + "namespace — they persist on the operator-managed registry PVC.\n", + "\n", + "You only need to re-run this cell if you change a definition.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "apply_features", + "metadata": {}, + "outputs": [], + "source": [ + "from feast import Project\n", + "\n", + "store = FeatureStore(repo_path=\".\")\n", + "\n", + "store.apply([\n", + " Project(name=FEAST_PROJECT),\n", + " customer,\n", + " customer_orders_source,\n", + " customer_order_stats,\n", + " customer_risk_features,\n", + " customer_risk_service,\n", + "])\n", + "\n", + "print(\"Registered in the remote registry:\")\n", + "for fv in store.list_feature_views():\n", + " print(f\" FeatureView: {fv.name}\")\n", + "for odfv in store.list_on_demand_feature_views():\n", + " print(f\" OnDemandFV: {odfv.name}\")\n", + "for fs in store.list_feature_services():\n", + " print(f\" FeatureService: {fs.name}\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "c7f35090", + "metadata": {}, + "source": [ + "---\n", + "## 5. Retrieve historical features and train a model\n", + "\n", + "`get_historical_features()` performs a **point-in-time join**: for each\n", + "customer, it finds the most recent feature values *as of that timestamp*.\n", + "This prevents data leakage — you only see features that were available when\n", + "the event occurred.\n", + "\n", + "After Feast returns the raw features, we compute two derived columns\n", + "(`return_rate` and `return_risk`) in plain pandas. Keeping them outside\n", + "Feast keeps this example simple; in a real pipeline you'd either precompute\n", + "them upstream and add them to the FeatureView, or use an on-demand feature\n", + "view once the operator/feast-version combo supports it.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7796c5fc", + "metadata": {}, + "outputs": [], + "source": [ + "# Build a query: \"give me features for these customers, as of right now.\"\n", + "# Each row says: I want to know the feature values for customer X at time T.\n", + "# Feast will find the most recent feature values that were available at that\n", + "# timestamp — this is the \"point-in-time join\" that prevents data leakage.\n", + "#\n", + "# The FeatureService includes the on-demand feature view, so return_rate and\n", + "# return_risk are computed by Feast automatically as part of this call.\n", + "\n", + "all_customer_ids = list(range(1, n_customers + 1))\n", + "query_timestamp = now # \"as of right now\"\n", + "\n", + "entity_df = pd.DataFrame({\n", + " \"customer_id\": all_customer_ids,\n", + " \"event_timestamp\": [query_timestamp] * len(all_customer_ids),\n", + "})\n", + "\n", + "print(f\"Querying features for {len(all_customer_ids)} customers...\")\n", + "\n", + "training_df = store.get_historical_features(\n", + " entity_df=entity_df,\n", + " features=customer_risk_service,\n", + ").to_df()\n", + "\n", + "# return_rate and return_risk are computed by the on-demand feature view —\n", + "# no manual pandas derivation needed here.\n", + "\n", + "print(f\"Retrieved {len(training_df)} rows.\")\n", + "training_df.head(10)\n" + ] + }, + { + "cell_type": "markdown", + "id": "preprocess_header", + "metadata": {}, + "source": [ + "### Preprocessing\n", + "\n", + "Before training, we need to clean up the data:\n", + "\n", + "1. **Join the label** — `returned` is our prediction target, not a feature.\n", + " We deliberately keep it out of Feast because at serving time (when a\n", + " customer places a new order) we don't know yet whether they will return\n", + " it — that's what the model predicts. Labels typically come from a\n", + " separate source (e.g. your data warehouse) and are joined only for\n", + " training.\n", + "2. **Filter out new customers** — customers with fewer than 3 orders don't\n", + " have enough history for reliable features. Feeding them into the model\n", + " would add noise.\n", + "3. **Drop nulls** — any rows where Feast couldn't find matching features.\n", + "4. **Normalize** — scale numeric features so the model doesn't overweight\n", + " high-magnitude columns like `avg_order_value`.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "preprocess", + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.preprocessing import StandardScaler\n", + "\n", + "# --- 1. Join the label from the source data ---\n", + "# Get the most recent snapshot per customer to use as label\n", + "latest_labels = (\n", + " customer_df\n", + " .sort_values(\"event_timestamp\")\n", + " .groupby(\"customer_id\")\n", + " .last()[[\"returned\"]]\n", + " .reset_index()\n", + ")\n", + "training_df = training_df.merge(latest_labels, on=\"customer_id\", how=\"left\")\n", + "\n", + "print(f\"After joining labels: {len(training_df)} rows\")\n", + "\n", + "# --- 2. Filter out new customers (< 3 orders) ---\n", + "before = len(training_df)\n", + "training_df = training_df[training_df[\"total_orders\"] >= 3].copy()\n", + "print(f\"After filtering new customers (< 3 orders): {len(training_df)} rows (dropped {before - len(training_df)})\")\n", + "\n", + "# --- 3. Drop nulls ---\n", + "before = len(training_df)\n", + "training_df = training_df.dropna()\n", + "print(f\"After dropping nulls: {len(training_df)} rows (dropped {before - len(training_df)})\")\n", + "\n", + "# --- 4. Normalize features ---\n", + "FEATURE_COLS = [\"total_orders\", \"total_returns\", \"avg_order_value\",\n", + " \"days_since_last_order\", \"return_rate\", \"return_risk\"]\n", + "TARGET = \"returned\"\n", + "\n", + "scaler = StandardScaler()\n", + "X = pd.DataFrame(\n", + " scaler.fit_transform(training_df[FEATURE_COLS]),\n", + " columns=FEATURE_COLS,\n", + " index=training_df.index,\n", + ")\n", + "y = training_df[TARGET]\n", + "\n", + "print(f\"\\nTraining set: {len(X)} samples, {len(FEATURE_COLS)} features\")\n", + "print(f\"Class balance: {y.mean():.1%} returns\")\n", + "X.describe().round(2)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "train_model", + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.linear_model import LogisticRegression\n", + "from sklearn.metrics import classification_report\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "X_train, X_test, y_train, y_test = train_test_split(\n", + " X, y, test_size=0.2, random_state=42, stratify=y\n", + ")\n", + "\n", + "model = LogisticRegression(random_state=42, max_iter=1000)\n", + "model.fit(X_train, y_train)\n", + "\n", + "y_pred = model.predict(X_test)\n", + "print(classification_report(y_test, y_pred, target_names=[\"kept\", \"returned\"]))\n" + ] + }, + { + "cell_type": "markdown", + "id": "51b15c4b", + "metadata": {}, + "source": [ + "---\n", + "## 6. Materialize features to Redis\n", + "\n", + "Materialization copies the latest feature values from the offline store\n", + "(parquet) into Redis for low-latency online serving.\n", + "\n", + "Note: only `FeatureView` data is materialized to Redis. The ODFV features\n", + "(`return_rate`, `return_risk`) are computed on-the-fly when you call\n", + "`get_online_features()` — Feast reads the raw values from Redis and applies\n", + "the transformation inline.\n", + "\n", + "In production you would run this on a schedule (e.g. daily after your\n", + "pipeline updates the parquet files).\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6183ced7", + "metadata": {}, + "outputs": [], + "source": [ + "from datetime import datetime, timedelta\n", + "\n", + "store.materialize(\n", + " start_date=datetime.now() - timedelta(days=30),\n", + " end_date=datetime.now(),\n", + ")\n", + "print(\"Materialized to Redis.\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "66ec423c", + "metadata": {}, + "source": [ + "---\n", + "## 7. Online feature serving — predict return risk\n", + "\n", + "A customer just placed an order. Your order service calls Feast to get\n", + "their features from Redis, computes the same derived columns we used in\n", + "training, feeds the result into the model, and decides whether to flag the\n", + "order for proactive customer service.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6982cdba", + "metadata": {}, + "outputs": [], + "source": [ + "# Simulate: these 3 customers just placed a new order\n", + "customers_with_new_orders = [{\"customer_id\": 5}, {\"customer_id\": 42}, {\"customer_id\": 137}]\n", + "\n", + "# Use the FeatureService instead of listing individual features.\n", + "# The service bundles all views — consumers don't need to know the internals.\n", + "# The on-demand feature view (return_rate, return_risk) is applied automatically\n", + "# by the SDK at request time before returning results.\n", + "online_features = store.get_online_features(\n", + " features=customer_risk_service,\n", + " entity_rows=customers_with_new_orders,\n", + ").to_dict()\n", + "\n", + "online_df = pd.DataFrame(online_features)\n", + "\n", + "print(\"Online features (raw from Redis + on-demand derived columns):\")\n", + "online_df\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "72e36e86", + "metadata": {}, + "outputs": [], + "source": [ + "# Run inference: predict return probability and flag high-risk orders\n", + "X_inference = pd.DataFrame(\n", + " scaler.transform(online_df[FEATURE_COLS]),\n", + " columns=FEATURE_COLS,\n", + ")\n", + "\n", + "return_probabilities = model.predict_proba(X_inference)[:, 1]\n", + "\n", + "RISK_THRESHOLD = 0.5\n", + "\n", + "print(\"\\n--- Return Risk Assessment ---\")\n", + "for cid, prob in zip(online_df[\"customer_id\"], return_probabilities):\n", + " flag = \"HIGH RISK\" if prob > RISK_THRESHOLD else \"low risk\"\n", + " print(f\" Customer {cid}: return probability = {prob:.1%} [{flag}]\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "72a52751", + "metadata": {}, + "source": [ + "---\n", + "## Summary\n", + "\n", + "| Step | API | What happens |\n", + "|------|-----|-------------|\n", + "| Define | Python objects (Entity, FeatureView, FeatureService) | Declare what features exist and how they are grouped |\n", + "| Register | `store.apply([...])` | Write definitions to the **remote registry** served by the Feast Operator |\n", + "| Train | `store.get_historical_features(features=service)` | Point-in-time join from parquet |\n", + "| Preprocess | pandas / sklearn | Derive columns, filter, clean, normalize |\n", + "| Materialize | `store.materialize()` | Push latest raw values to Redis |\n", + "| Serve | `store.get_online_features(features=service)` | Sub-ms lookup from Redis |\n", + "\n", + "### FeatureService\n", + "\n", + "A `FeatureService` bundles one or more feature views under a single name.\n", + "Clients (notebooks, inference services) reference the service name instead\n", + "of listing individual feature names — they don't need to know the internal\n", + "view structure. Define once, use everywhere.\n", + "\n", + "### About the registry\n", + "\n", + "The registry is the gRPC service deployed by the Feast Operator\n", + "(`feast--registry`) and backed by the persistent volume on the\n", + "FeatureStore CR. Feature *definitions* you `apply()` from this notebook\n", + "are visible to every other client in the namespace and survive pod\n", + "restarts. Feature *data* lives in Redis (online) and parquet on the PVC\n", + "(offline).\n" + ] + }, + { + "cell_type": "markdown", + "id": "production_setup", + "metadata": {}, + "source": [ + "---\n", + "## Production hardening\n", + "\n", + "This notebook already uses the production registry served by the Feast\n", + "Operator. To make the rest of the workflow production-grade:\n", + "\n", + "### 1. Feature definitions live in Git\n", + "\n", + "Instead of defining features in a notebook, put them in a `features.py`\n", + "file in a Git repository. The Feast Operator clones the repo on startup\n", + "and runs `feast apply` automatically:\n", + "\n", + "```yaml\n", + "# feast-cr.yaml\n", + "spec:\n", + " feastProject: retail_features\n", + " feastProjectDir:\n", + " git:\n", + " url: https://github.com/your-org/feast-feature-repo\n", + " ref: main # or pin to a commit SHA\n", + "```\n", + "\n", + "Feature definitions are then version-controlled, reviewed via PRs, and\n", + "automatically deployed when the pod starts. Notebooks shift from being\n", + "authors of definitions to consumers of them.\n", + "\n", + "### 2. Materialization runs as a CronJob\n", + "\n", + "Instead of running `store.materialize()` from a notebook, set up a\n", + "Kubernetes CronJob that runs on a schedule (e.g. daily, after your ETL\n", + "pipeline refreshes the customer data). The Feast Operator can manage this\n", + "via the `batchEngine` config.\n", + "\n", + "### 3. Use a SQL-backed registry\n", + "\n", + "Switch the registry persistence from PVC-backed SQLite to PostgreSQL —\n", + "better for multi-replica feast-server deployments and concurrent writes.\n", + "\n", + "```yaml\n", + "spec:\n", + " services:\n", + " registry:\n", + " local:\n", + " server: {}\n", + " persistence:\n", + " store:\n", + " type: sql\n", + " secretRef:\n", + " name: feast-registry-db\n", + "```\n", + "\n", + "### 4. Component overview\n", + "\n", + "| Component | This notebook | Production |\n", + "|-----------|---------------|------------|\n", + "| Registry | gRPC server, PVC-backed SQLite | gRPC server, PostgreSQL |\n", + "| Online Store | Redis (operator-managed) | Same |\n", + "| Offline Store | Parquet on PVC | Parquet on PVC, or S3/MinIO |\n", + "| Feature definitions | Defined in notebook | Defined in Git, applied on operator startup |\n", + "| Materialization | Run from notebook | CronJob |\n", + "| Feast Server | 1 replica | Multi-replica with HPA |\n", + "\n", + "For the full production deployment guide, see the\n", + "[Feast Production Deployment Topologies](https://docs.feast.dev/how-to-guides/production-deployment-topologies)\n", + "documentation.\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "name": "python", + "version": "3.11.0" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/feast/redis-cr.yaml b/feast/redis-cr.yaml new file mode 100644 index 0000000..fd5846c --- /dev/null +++ b/feast/redis-cr.yaml @@ -0,0 +1,36 @@ +# Redis instance for Feast online store. +# Replace with your Kubeflow profile namespace. +# +# Before applying, create the password secret: +# kubectl create secret generic redis-feast \ +# -n \ +# --from-literal=password=$(openssl rand -base64 24 | tr -d '/') +apiVersion: redis.redis.opstreelabs.in/v1beta2 +kind: Redis +metadata: + name: redis-feast +spec: + kubernetesConfig: + image: quay.io/opstree/redis:v7.0.15 + imagePullPolicy: IfNotPresent + redisSecret: + name: redis-feast + key: password + resources: + requests: + cpu: 100m + memory: 128Mi + limits: + cpu: 200m + memory: 256Mi + podSecurityContext: + fsGroup: 1000 + runAsUser: 1000 + storage: + volumeClaimTemplate: + spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 1Gi diff --git a/feast/registry/local/README.md b/feast/registry/local/README.md new file mode 100644 index 0000000..2e73b13 --- /dev/null +++ b/feast/registry/local/README.md @@ -0,0 +1,36 @@ +# Local registry mode + +The notebook talks to the registry via a **local SQLite SQL file** written +directly by the Feast SDK — no gRPC server involved. + +## How it works + +- `feast apply` writes feature definitions to `sqlite:////tmp/registry.db` + (or a mounted PVC path — see `feature_store.yaml`). +- The registry is read back from the same file. No network hop, no protocol + negotiation. +- ODFVs (on-demand feature views) work without workarounds. + +## Trade-offs vs remote mode + +| | Local (this folder) | Remote | +|---|---|---| +| Registry persistence | Ephemeral (`/tmp`) by default | Persistent on operator PVC | +| Shared across clients | No — each notebook has its own `/tmp` | Yes — all clients in the namespace see the same definitions | +| Setup complexity | Low | Higher | + +## When to use + +Use local mode when: +- You are the only user of this feature store +- You are experimenting or iterating quickly + +Use remote mode when you need definitions to persist across pod restarts or +be shared with other clients in the namespace. + +## Files + +| File | Purpose | +|------|---------| +| `feast-cr.yaml` | FeatureStore CR — no `server: {}`, registry PVC only | +| `feature_store.yaml` | Feast SDK config template (notebook writes this) | diff --git a/feast/registry/local/feast-cr.yaml b/feast/registry/local/feast-cr.yaml new file mode 100644 index 0000000..74b109e --- /dev/null +++ b/feast/registry/local/feast-cr.yaml @@ -0,0 +1,52 @@ +# FeatureStore CR — local registry mode. +# +# The operator will create: +# - A Feast deployment + service (online feature server) +# - PVCs for the registry and offline data store +# - A ConfigMap (feast--client) with client connection info +# +# The registry PVC is provisioned but the notebook talks to it directly +# via a local SQLite SQL file (not through the gRPC server). +# +# Prerequisites: +# - feast-redis-config secret must exist in your namespace (see README) +apiVersion: feast.dev/v1 +kind: FeatureStore +metadata: + name: my-store +spec: + feastProject: retail_features + services: + runFeastApplyOnInit: false + securityContext: + runAsUser: 0 + registry: + local: + # No server: {} here — the notebook uses the SQLite file directly. + persistence: + file: + pvc: + mountPath: /data/registry + create: + # storageClassName: default # omit to use cluster default + resources: + requests: + storage: 1Gi + offlineStore: + persistence: + file: + type: file + pvc: + mountPath: /data/offline + create: + storageClassName: mayastor-no-redundancy # adjust for your cluster + resources: + requests: + storage: 10Gi + onlineStore: + persistence: + store: + type: redis + secretRef: + name: feast-redis-config + secretKeyName: redis diff --git a/feast/registry/local/feature_store.yaml b/feast/registry/local/feature_store.yaml new file mode 100644 index 0000000..95145d2 --- /dev/null +++ b/feast/registry/local/feature_store.yaml @@ -0,0 +1,29 @@ +# Registry: local SQLite SQL on /tmp (ephemeral) or PVC (persistent). +# +# SQLite SQL gives proper transactional semantics over the plain file registry +# with identical maintenance burden — no server, just a file. +# +# Path options: +# Ephemeral (survives the notebook session, lost on pod restart): +# sqlite:////tmp/registry.db +# Persistent (mount the registry PVC at /data/registry first): +# sqlite:////data/registry/registry.db +# +# Note: four slashes = absolute path (SQLAlchemy convention for SQLite). +# PVC name: feast--registry +# +# The notebook writes this file automatically — you don't need to edit it. +project: retail_features +provider: local +offline_store: + type: file +online_store: + type: redis + connection_string: ":6379,password=" +registry: + registry_type: sql + path: sqlite:////tmp/registry.db + cache_ttl_seconds: 60 +auth: + type: no_auth +entity_key_serialization_version: 3 diff --git a/feast/registry/remote/README.md b/feast/registry/remote/README.md new file mode 100644 index 0000000..30ab35f --- /dev/null +++ b/feast/registry/remote/README.md @@ -0,0 +1,60 @@ +# Remote registry mode + +The notebook talks to the registry via the **gRPC server** that the Feast +Operator exposes from the FeatureStore CR. Feature definitions you `apply()` +persist on the operator-managed PVC and are visible to every other client in +the namespace. + +## How it works + +- `feast apply` sends definitions to the registry gRPC server over the + operator's native registry Service. +- All clients in the namespace share the same registry — no need to re-run + `apply()` after a notebook restart. +- The operator publishes a `feast--client` ConfigMap with the + connection details; the notebook reads it automatically. + +## Trade-offs vs local mode + +| | Remote (this folder) | Local | +|---|---|---| +| Registry persistence | Persistent on operator PVC | Ephemeral (`/tmp`) by default | +| Shared across clients | Yes | No | +| Setup complexity | Higher | Low | + +## Network policies + +`network-policies.yaml` restricts access to the registry and Redis to pods +within the same namespace. This is defense-in-depth alongside the +namespace-isolation AuthorizationPolicy that the Kubeflow profile controller +creates — NetworkPolicies are enforced at the CNI layer independently of the +Istio mesh. + +| Policy | Protects | Port | +|--------|----------|------| +| `feast-my-store-registry-ingress` | Feast registry gRPC server | 6570 | +| `redis-feast-ingress` | Redis online store | 6379 | + +## Setup + +Follow the top-level README through the Redis and `feast-redis-config` steps, +then: + +```bash +# 1. Deploy the FeatureStore CR +kubectl apply -f registry/remote/feast-cr.yaml +kubectl get featurestore -w # wait until Ready + +# 2. Apply the network policies +kubectl apply -f registry/remote/network-policies.yaml +``` + +Then open the notebook and select **Remote** when prompted. + +## Files + +| File | Purpose | +|------|---------| +| `feast-cr.yaml` | FeatureStore CR with `server: {}` to enable the gRPC registry | +| `feature_store.yaml` | Feast SDK config template (notebook writes this from the operator ConfigMap) | +| `network-policies.yaml` | CNI-layer NetworkPolicies for registry and Redis isolation | diff --git a/feast/registry/remote/feast-cr.yaml b/feast/registry/remote/feast-cr.yaml new file mode 100644 index 0000000..4074d6e --- /dev/null +++ b/feast/registry/remote/feast-cr.yaml @@ -0,0 +1,56 @@ +# FeatureStore CR — remote registry mode. +# +# The operator will create: +# - A Feast deployment + service (online feature server) +# - PVCs for the SQLite registry and offline data store +# - A ConfigMap (feast--client) with client connection info +# +# This CR enables the registry gRPC server (services.registry.local.server) +# so notebooks and other clients can read and write feature definitions +# remotely. Definitions persist on the operator-managed PVC across restarts +# and are shared with every client in the namespace. +# +# Prerequisites: +# - feast-redis-config secret must exist in your namespace (see top-level README) +apiVersion: feast.dev/v1 +kind: FeatureStore +metadata: + name: my-store +spec: + feastProject: retail_features + services: + runFeastApplyOnInit: false + securityContext: + runAsUser: 0 + registry: + local: + # Expose the registry as a gRPC server so notebooks/clients can read + # and write feature definitions remotely (production pattern). + server: {} + persistence: + file: + pvc: + mountPath: /data/registry + create: + # storageClassName: default # omit to use cluster default + resources: + requests: + storage: 1Gi + offlineStore: + persistence: + file: + type: file + pvc: + mountPath: /data/offline + create: + storageClassName: mayastor-no-redundancy # adjust for your cluster + resources: + requests: + storage: 10Gi + onlineStore: + persistence: + store: + type: redis + secretRef: + name: feast-redis-config + secretKeyName: redis diff --git a/feast/registry/remote/feature_store.yaml b/feast/registry/remote/feature_store.yaml new file mode 100644 index 0000000..04b651b --- /dev/null +++ b/feast/registry/remote/feature_store.yaml @@ -0,0 +1,18 @@ +# Registry: remote gRPC server managed by the Feast Operator. +# +# The notebook reads this from the operator-published ConfigMap +# (feast--client) and overrides online_store with a direct Redis +# connection for materialization. You don't need to edit this file manually. +project: retail_features +provider: local +offline_store: + type: file +online_store: + type: redis + connection_string: ":6379,password=" +registry: + registry_type: remote + path: grpc://feast-my-store-registry..svc.cluster.local:6570 +auth: + type: no_auth +entity_key_serialization_version: 3 diff --git a/feast/registry/remote/network-policies.yaml b/feast/registry/remote/network-policies.yaml new file mode 100644 index 0000000..6c90076 --- /dev/null +++ b/feast/registry/remote/network-policies.yaml @@ -0,0 +1,60 @@ +# NetworkPolicies for Feast registry and online store isolation. +# +# These policies restrict registry and Redis access to pods within the same +# namespace, providing defense-in-depth alongside Istio AuthorizationPolicies. +# NetworkPolicies operate at the CNI layer and enforce isolation regardless of +# sidecar configuration. +# +# Note: the namespaceSelector inside each ingress rule is intentional. +# Calico eBPF mode does not reliably enforce same-namespace restriction when +# only podSelector: {} is used in the from clause — an explicit +# namespaceSelector is required to lock access to a specific namespace. +# +# The placeholder is filled in by the notebook at apply time +# (it reads the namespace from the pod's service account token). +# +# --------------------------------------------------------------------------- + +# Feast registry server: allow ingress on port 6570 only from pods in the +# same namespace. Pods in other namespaces are blocked at the CNI layer. +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: feast-my-store-registry-ingress +spec: + podSelector: + matchLabels: + feast.dev/name: my-store # label applied by the Feast operator to the server pod + policyTypes: + - Ingress + ingress: + - from: + - podSelector: {} + namespaceSelector: + matchLabels: + kubernetes.io/metadata.name: + ports: + - port: 6570 + protocol: TCP +--- +# Redis online store: allow ingress on port 6379 only from pods in the +# same namespace. +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: redis-feast-ingress +spec: + podSelector: + matchLabels: + app: redis-feast # label applied by the OpsTree Redis operator (app: ) + policyTypes: + - Ingress + ingress: + - from: + - podSelector: {} + namespaceSelector: + matchLabels: + kubernetes.io/metadata.name: + ports: + - port: 6379 + protocol: TCP