Skip to content

Commit f3432c8

Browse files
committed
feat: Feast First-Class LabelView Implementation
Signed-off-by: ntkathole <nikhilkathole2683@gmail.com>
1 parent 153a279 commit f3432c8

50 files changed

Lines changed: 3283 additions & 488 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs/SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
* [Data ingestion](getting-started/concepts/data-ingestion.md)
2525
* [Entity](getting-started/concepts/entity.md)
2626
* [Feature view](getting-started/concepts/feature-view.md)
27+
* [\[Alpha\] Label view](getting-started/concepts/label-view.md)
2728
* [Feature retrieval](getting-started/concepts/feature-retrieval.md)
2829
* [Point-in-time joins](getting-started/concepts/point-in-time-joins.md)
2930
* [\[Alpha\] Saved dataset](getting-started/concepts/dataset.md)

docs/adr/ADR-0012-label-view.md

Lines changed: 426 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
# Label View
2+
3+
{% hint style="info" %}
4+
**\[Alpha]** Label views are an alpha feature. The API may change in future releases.
5+
{% endhint %}
6+
7+
A **label view** is a Feast primitive that manages *mutable* labels and annotations, kept separate from the *immutable* feature data stored in regular [feature views](feature-view.md). This separation follows a clean design principle: observational data (features) is append-only, while judgments about that data (labels, scores, reward signals) are updated over time by multiple independent sources.
8+
9+
Label views are especially useful in **RLHF/reward-modeling pipelines**, **multi-annotator workflows**, and **safety monitoring systems** where different labelers — human reviewers, automated scanners, reward models — independently write labels for the same entity keys.
10+
11+
## Key Capabilities
12+
13+
- **Multi-labeler support**: Multiple independent labelers can write labels for the same entity key. A configurable `labeler_field` tracks which source wrote each label.
14+
- **Conflict resolution policies**: When labelers disagree, Feast resolves conflicts according to a `ConflictPolicy` — last-write-wins, labeler priority, or majority vote. See [Alpha limitations](#alpha-limitations) below.
15+
- **History retention**: Optionally retain the full history of label writes per entity key, not just the latest value. See [Alpha limitations](#alpha-limitations) below.
16+
- **Reference feature view**: Optionally link a label view to the `FeatureView` whose entities it annotates, for documentation and lineage.
17+
- **Flexible ingestion**: Labels can be written in real time via `FeatureStore.push()` using a `PushSource`, or loaded in bulk from a historical table (Snowflake, Spark, Parquet, etc.) by setting a `batch_source` and running `feast materialize`.
18+
- **FeatureService composability**: Label views can be included alongside regular feature views in a `FeatureService`, so training pipelines can retrieve features and their labels together.
19+
20+
## When to use Label Views
21+
22+
| Use a **FeatureView** when… | Use a **LabelView** when… |
23+
|---|---|
24+
| Data is observational and append-only (e.g. driver trip counts, page views) | Data is a judgment or annotation about an entity (e.g. reward labels, safety scores) |
25+
| A single source of truth writes the data | Multiple labelers may write conflicting values for the same key |
26+
| History is naturally time-series | You need explicit control over whether history is retained or overwritten |
27+
28+
## Defining a Label View
29+
30+
```python
31+
from datetime import timedelta
32+
33+
from feast import Entity, FeatureService, Field, PushSource
34+
from feast.labeling import ConflictPolicy, LabelView
35+
from feast.types import Float32, String
36+
37+
interaction = Entity(
38+
name="interaction",
39+
join_keys=["interaction_id"],
40+
)
41+
42+
label_source = PushSource(
43+
name="label_push_source",
44+
schema=[
45+
Field(name="interaction_id", dtype=String),
46+
Field(name="reward_label", dtype=String),
47+
Field(name="safety_score", dtype=Float32),
48+
Field(name="labeler", dtype=String),
49+
],
50+
)
51+
52+
interaction_labels = LabelView(
53+
name="interaction_labels",
54+
entities=[interaction],
55+
ttl=timedelta(days=90),
56+
schema=[
57+
Field(name="interaction_id", dtype=String),
58+
Field(name="reward_label", dtype=String),
59+
Field(name="safety_score", dtype=Float32),
60+
Field(name="labeler", dtype=String),
61+
],
62+
source=label_source,
63+
labeler_field="labeler",
64+
conflict_policy=ConflictPolicy.LAST_WRITE_WINS,
65+
retain_history=True,
66+
reference_feature_view="interaction_history",
67+
description="Reward and safety labels on agent interactions.",
68+
owner="ml-safety-team@example.com",
69+
)
70+
```
71+
72+
## Conflict Policies
73+
74+
The `ConflictPolicy` enum controls how conflicting labels from different labelers are **intended** to be resolved at read time:
75+
76+
| Policy | Behavior |
77+
|---|---|
78+
| `LAST_WRITE_WINS` | The most recently written label for a given entity key takes precedence, regardless of which labeler wrote it. This is the default. |
79+
| `LABELER_PRIORITY` | Labels are ranked by a pre-configured labeler priority order. Higher-priority labelers override lower-priority ones. |
80+
| `MAJORITY_VOTE` | The label value that appears most frequently across all labelers is selected. Useful for consensus-based annotation workflows. |
81+
82+
## Alpha Limitations
83+
84+
{% hint style="warning" %}
85+
The following capabilities are **defined and stored** in the label-view metadata but are **not yet enforced** by the Feast runtime. They are persisted in the registry so that future releases can activate them without a schema migration.
86+
{% endhint %}
87+
88+
### Conflict-policy enforcement at read time
89+
90+
`conflict_policy` is stored as part of the `LabelView` definition, but it is **not enforced** during `get_online_features`. The online store currently returns the last-written row for a given entity key regardless of which policy is configured.
91+
92+
Real enforcement will require changes to the online-store query path so that the store can consider multiple rows per entity key and apply the conflict-resolution strategy.
93+
94+
### History retention at write time
95+
96+
`retain_history` is stored but **not acted on**. The online store always overwrites the previous value when a new label is written for the same entity key.
97+
98+
Implementing retention will require changes to the online-store write path so that it appends rather than upserts, along with a compaction or eviction strategy for old entries.
99+
100+
### Batch materialization
101+
102+
Batch materialization behaviour depends on whether the label view has a `batch_source`:
103+
104+
- **With `batch_source`** (direct `DataSource` or a `PushSource` that wraps a `batch_source`): `feast materialize` and `feast materialize-incremental` include the label view and write historical label rows to the offline store. This is the recommended path for teams with large pre-existing label tables (e.g. a Snowflake or Spark table of loan-default outcomes).
105+
- **Without `batch_source`** (push-only label views): the label view is excluded from `feast materialize`. Labels must arrive via `FeatureStore.push()`. Attempting to materialize such a label view by name will raise a clear error.
106+
107+
## Using with Feature Services
108+
109+
Label views can be composed with regular feature views in a `FeatureService`, so downstream consumers (training pipelines, batch scoring jobs) get features and labels in a single retrieval call:
110+
111+
```python
112+
training_service = FeatureService(
113+
name="interaction_training_service",
114+
features=[
115+
interaction_history, # regular FeatureView with immutable features
116+
interaction_labels, # LabelView with mutable reward labels
117+
],
118+
)
119+
```
120+
121+
## Pushing Labels
122+
123+
Labels are typically written via `FeatureStore.push()` using the label view's `PushSource`:
124+
125+
```python
126+
import pandas as pd
127+
from feast import FeatureStore
128+
129+
store = FeatureStore(repo_path="feature_repo/")
130+
131+
labels_df = pd.DataFrame({
132+
"interaction_id": ["int-001", "int-002"],
133+
"reward_label": ["positive", "negative"],
134+
"safety_score": [0.95, 0.12],
135+
"labeler": ["nemo_guardrails", "nemo_guardrails"],
136+
"event_timestamp": pd.to_datetime(["2025-01-15", "2025-01-15"]),
137+
})
138+
139+
store.push("label_push_source", labels_df)
140+
```
141+
142+
This writes the labels into both the online and offline stores, making them available for real-time serving and historical training dataset generation.

0 commit comments

Comments
 (0)