Skip to content

Commit 468ea54

Browse files
committed
feat: Feast First-Class LabelView Implementation
Signed-off-by: ntkathole <nikhilkathole2683@gmail.com>
1 parent c559889 commit 468ea54

47 files changed

Lines changed: 2989 additions & 618 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs/SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
* [Data ingestion](getting-started/concepts/data-ingestion.md)
2525
* [Entity](getting-started/concepts/entity.md)
2626
* [Feature view](getting-started/concepts/feature-view.md)
27+
* [\[Alpha\] Label view](getting-started/concepts/label-view.md)
2728
* [Feature retrieval](getting-started/concepts/feature-retrieval.md)
2829
* [Point-in-time joins](getting-started/concepts/point-in-time-joins.md)
2930
* [\[Alpha\] Saved dataset](getting-started/concepts/dataset.md)
Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
# Label View
2+
3+
{% hint style="info" %}
4+
**\[Alpha]** Label views are an alpha feature. The API may change in future releases.
5+
{% endhint %}
6+
7+
A **label view** is a Feast primitive that manages *mutable* labels and annotations, kept separate from the *immutable* feature data stored in regular [feature views](feature-view.md). This separation follows a clean design principle: observational data (features) is append-only, while judgments about that data (labels, scores, reward signals) are updated over time by multiple independent sources.
8+
9+
Label views are especially useful in **RLHF/reward-modeling pipelines**, **multi-annotator workflows**, and **safety monitoring systems** where different labelers — human reviewers, automated scanners, reward models — independently write labels for the same entity keys.
10+
11+
## Key Capabilities
12+
13+
- **Multi-labeler support**: Multiple independent labelers can write labels for the same entity key. A configurable `labeler_field` tracks which source wrote each label.
14+
- **Conflict resolution policies**: When labelers disagree, Feast resolves conflicts according to a `ConflictPolicy` — last-write-wins, labeler priority, or majority vote. See [Alpha limitations](#alpha-limitations) below.
15+
- **History retention**: Optionally retain the full history of label writes per entity key, not just the latest value. See [Alpha limitations](#alpha-limitations) below.
16+
- **Reference feature view**: Optionally link a label view to the `FeatureView` whose entities it annotates, for documentation and lineage.
17+
- **PushSource integration**: Label views are designed to work with `PushSource`, allowing labels to be written in real time via `FeatureStore.push()`.
18+
- **FeatureService composability**: Label views can be included alongside regular feature views in a `FeatureService`, so training pipelines can retrieve features and their labels together.
19+
20+
## When to use Label Views
21+
22+
| Use a **FeatureView** when… | Use a **LabelView** when… |
23+
|---|---|
24+
| Data is observational and append-only (e.g. driver trip counts, page views) | Data is a judgment or annotation about an entity (e.g. reward labels, safety scores) |
25+
| A single source of truth writes the data | Multiple labelers may write conflicting values for the same key |
26+
| History is naturally time-series | You need explicit control over whether history is retained or overwritten |
27+
28+
## Defining a Label View
29+
30+
```python
31+
from datetime import timedelta
32+
33+
from feast import Entity, FeatureService, Field, PushSource
34+
from feast.labeling import ConflictPolicy, LabelView
35+
from feast.types import Float32, String
36+
37+
interaction = Entity(
38+
name="interaction",
39+
join_keys=["interaction_id"],
40+
)
41+
42+
label_source = PushSource(
43+
name="label_push_source",
44+
schema=[
45+
Field(name="interaction_id", dtype=String),
46+
Field(name="reward_label", dtype=String),
47+
Field(name="safety_score", dtype=Float32),
48+
Field(name="labeler", dtype=String),
49+
],
50+
)
51+
52+
interaction_labels = LabelView(
53+
name="interaction_labels",
54+
entities=[interaction],
55+
ttl=timedelta(days=90),
56+
schema=[
57+
Field(name="interaction_id", dtype=String),
58+
Field(name="reward_label", dtype=String),
59+
Field(name="safety_score", dtype=Float32),
60+
Field(name="labeler", dtype=String),
61+
],
62+
source=label_source,
63+
labeler_field="labeler",
64+
conflict_policy=ConflictPolicy.LAST_WRITE_WINS,
65+
retain_history=True,
66+
reference_feature_view="interaction_history",
67+
description="Reward and safety labels on agent interactions.",
68+
owner="ml-safety-team@example.com",
69+
)
70+
```
71+
72+
## Conflict Policies
73+
74+
The `ConflictPolicy` enum controls how conflicting labels from different labelers are **intended** to be resolved at read time:
75+
76+
| Policy | Behavior |
77+
|---|---|
78+
| `LAST_WRITE_WINS` | The most recently written label for a given entity key takes precedence, regardless of which labeler wrote it. This is the default. |
79+
| `LABELER_PRIORITY` | Labels are ranked by a pre-configured labeler priority order. Higher-priority labelers override lower-priority ones. |
80+
| `MAJORITY_VOTE` | The label value that appears most frequently across all labelers is selected. Useful for consensus-based annotation workflows. |
81+
82+
## Alpha Limitations
83+
84+
{% hint style="warning" %}
85+
The following capabilities are **defined and stored** in the label-view metadata but are **not yet enforced** by the Feast runtime. They are persisted in the registry so that future releases can activate them without a schema migration.
86+
{% endhint %}
87+
88+
### Conflict-policy enforcement at read time
89+
90+
`conflict_policy` is stored as part of the `LabelView` definition, but it is **not enforced** during `get_online_features`. The online store currently returns the last-written row for a given entity key regardless of which policy is configured.
91+
92+
Real enforcement will require changes to the online-store query path so that the store can consider multiple rows per entity key and apply the conflict-resolution strategy.
93+
94+
### History retention at write time
95+
96+
`retain_history` is stored but **not acted on**. The online store always overwrites the previous value when a new label is written for the same entity key.
97+
98+
Implementing retention will require changes to the online-store write path so that it appends rather than upserts, along with a compaction or eviction strategy for old entries.
99+
100+
### Batch materialization
101+
102+
Label views are **not included** in `feast materialize` or `feast materialize-incremental`. Labels are ingested via `FeatureStore.push()` (real-time) and do not go through the batch materialization pipeline. Attempting to materialize a label view by name will raise a clear error.
103+
104+
## Using with Feature Services
105+
106+
Label views can be composed with regular feature views in a `FeatureService`, so downstream consumers (training pipelines, batch scoring jobs) get features and labels in a single retrieval call:
107+
108+
```python
109+
training_service = FeatureService(
110+
name="interaction_training_service",
111+
features=[
112+
interaction_history, # regular FeatureView with immutable features
113+
interaction_labels, # LabelView with mutable reward labels
114+
],
115+
)
116+
```
117+
118+
## Pushing Labels
119+
120+
Labels are typically written via `FeatureStore.push()` using the label view's `PushSource`:
121+
122+
```python
123+
import pandas as pd
124+
from feast import FeatureStore
125+
126+
store = FeatureStore(repo_path="feature_repo/")
127+
128+
labels_df = pd.DataFrame({
129+
"interaction_id": ["int-001", "int-002"],
130+
"reward_label": ["positive", "negative"],
131+
"safety_score": [0.95, 0.12],
132+
"labeler": ["nemo_guardrails", "nemo_guardrails"],
133+
"event_timestamp": pd.to_datetime(["2025-01-15", "2025-01-15"]),
134+
})
135+
136+
store.push("label_push_source", labels_df)
137+
```
138+
139+
This writes the labels into both the online and offline stores, making them available for real-time serving and historical training dataset generation.

0 commit comments

Comments
 (0)