|
| 1 | +# Label View |
| 2 | + |
| 3 | +{% hint style="info" %} |
| 4 | +**\[Alpha]** Label views are an alpha feature. The API may change in future releases. |
| 5 | +{% endhint %} |
| 6 | + |
| 7 | +A **label view** is a Feast primitive that manages *mutable* labels and annotations, kept separate from the *immutable* feature data stored in regular [feature views](feature-view.md). This separation follows a clean design principle: observational data (features) is append-only, while judgments about that data (labels, scores, reward signals) are updated over time by multiple independent sources. |
| 8 | + |
| 9 | +Label views are especially useful in **RLHF/reward-modeling pipelines**, **multi-annotator workflows**, and **safety monitoring systems** where different labelers — human reviewers, automated scanners, reward models — independently write labels for the same entity keys. |
| 10 | + |
| 11 | +## Key Capabilities |
| 12 | + |
| 13 | +- **Multi-labeler support**: Multiple independent labelers can write labels for the same entity key. A configurable `labeler_field` tracks which source wrote each label. |
| 14 | +- **Conflict resolution policies**: When labelers disagree, Feast resolves conflicts according to a `ConflictPolicy` — last-write-wins, labeler priority, or majority vote. See [Alpha limitations](#alpha-limitations) below. |
| 15 | +- **History retention**: Optionally retain the full history of label writes per entity key, not just the latest value. See [Alpha limitations](#alpha-limitations) below. |
| 16 | +- **Reference feature view**: Optionally link a label view to the `FeatureView` whose entities it annotates, for documentation and lineage. |
| 17 | +- **PushSource integration**: Label views are designed to work with `PushSource`, allowing labels to be written in real time via `FeatureStore.push()`. |
| 18 | +- **FeatureService composability**: Label views can be included alongside regular feature views in a `FeatureService`, so training pipelines can retrieve features and their labels together. |
| 19 | + |
| 20 | +## When to use Label Views |
| 21 | + |
| 22 | +| Use a **FeatureView** when… | Use a **LabelView** when… | |
| 23 | +|---|---| |
| 24 | +| Data is observational and append-only (e.g. driver trip counts, page views) | Data is a judgment or annotation about an entity (e.g. reward labels, safety scores) | |
| 25 | +| A single source of truth writes the data | Multiple labelers may write conflicting values for the same key | |
| 26 | +| History is naturally time-series | You need explicit control over whether history is retained or overwritten | |
| 27 | + |
| 28 | +## Defining a Label View |
| 29 | + |
| 30 | +```python |
| 31 | +from datetime import timedelta |
| 32 | + |
| 33 | +from feast import Entity, FeatureService, Field, PushSource |
| 34 | +from feast.labeling import ConflictPolicy, LabelView |
| 35 | +from feast.types import Float32, String |
| 36 | + |
| 37 | +interaction = Entity( |
| 38 | + name="interaction", |
| 39 | + join_keys=["interaction_id"], |
| 40 | +) |
| 41 | + |
| 42 | +label_source = PushSource( |
| 43 | + name="label_push_source", |
| 44 | + schema=[ |
| 45 | + Field(name="interaction_id", dtype=String), |
| 46 | + Field(name="reward_label", dtype=String), |
| 47 | + Field(name="safety_score", dtype=Float32), |
| 48 | + Field(name="labeler", dtype=String), |
| 49 | + ], |
| 50 | +) |
| 51 | + |
| 52 | +interaction_labels = LabelView( |
| 53 | + name="interaction_labels", |
| 54 | + entities=[interaction], |
| 55 | + ttl=timedelta(days=90), |
| 56 | + schema=[ |
| 57 | + Field(name="interaction_id", dtype=String), |
| 58 | + Field(name="reward_label", dtype=String), |
| 59 | + Field(name="safety_score", dtype=Float32), |
| 60 | + Field(name="labeler", dtype=String), |
| 61 | + ], |
| 62 | + source=label_source, |
| 63 | + labeler_field="labeler", |
| 64 | + conflict_policy=ConflictPolicy.LAST_WRITE_WINS, |
| 65 | + retain_history=True, |
| 66 | + reference_feature_view="interaction_history", |
| 67 | + description="Reward and safety labels on agent interactions.", |
| 68 | + owner="ml-safety-team@example.com", |
| 69 | +) |
| 70 | +``` |
| 71 | + |
| 72 | +## Conflict Policies |
| 73 | + |
| 74 | +The `ConflictPolicy` enum controls how conflicting labels from different labelers are **intended** to be resolved at read time: |
| 75 | + |
| 76 | +| Policy | Behavior | |
| 77 | +|---|---| |
| 78 | +| `LAST_WRITE_WINS` | The most recently written label for a given entity key takes precedence, regardless of which labeler wrote it. This is the default. | |
| 79 | +| `LABELER_PRIORITY` | Labels are ranked by a pre-configured labeler priority order. Higher-priority labelers override lower-priority ones. | |
| 80 | +| `MAJORITY_VOTE` | The label value that appears most frequently across all labelers is selected. Useful for consensus-based annotation workflows. | |
| 81 | + |
| 82 | +## Alpha Limitations |
| 83 | + |
| 84 | +{% hint style="warning" %} |
| 85 | +The following capabilities are **defined and stored** in the label-view metadata but are **not yet enforced** by the Feast runtime. They are persisted in the registry so that future releases can activate them without a schema migration. |
| 86 | +{% endhint %} |
| 87 | + |
| 88 | +### Conflict-policy enforcement at read time |
| 89 | + |
| 90 | +`conflict_policy` is stored as part of the `LabelView` definition, but it is **not enforced** during `get_online_features`. The online store currently returns the last-written row for a given entity key regardless of which policy is configured. |
| 91 | + |
| 92 | +Real enforcement will require changes to the online-store query path so that the store can consider multiple rows per entity key and apply the conflict-resolution strategy. |
| 93 | + |
| 94 | +### History retention at write time |
| 95 | + |
| 96 | +`retain_history` is stored but **not acted on**. The online store always overwrites the previous value when a new label is written for the same entity key. |
| 97 | + |
| 98 | +Implementing retention will require changes to the online-store write path so that it appends rather than upserts, along with a compaction or eviction strategy for old entries. |
| 99 | + |
| 100 | +### Batch materialization |
| 101 | + |
| 102 | +Label views are **not included** in `feast materialize` or `feast materialize-incremental`. Labels are ingested via `FeatureStore.push()` (real-time) and do not go through the batch materialization pipeline. Attempting to materialize a label view by name will raise a clear error. |
| 103 | + |
| 104 | +## Using with Feature Services |
| 105 | + |
| 106 | +Label views can be composed with regular feature views in a `FeatureService`, so downstream consumers (training pipelines, batch scoring jobs) get features and labels in a single retrieval call: |
| 107 | + |
| 108 | +```python |
| 109 | +training_service = FeatureService( |
| 110 | + name="interaction_training_service", |
| 111 | + features=[ |
| 112 | + interaction_history, # regular FeatureView with immutable features |
| 113 | + interaction_labels, # LabelView with mutable reward labels |
| 114 | + ], |
| 115 | +) |
| 116 | +``` |
| 117 | + |
| 118 | +## Pushing Labels |
| 119 | + |
| 120 | +Labels are typically written via `FeatureStore.push()` using the label view's `PushSource`: |
| 121 | + |
| 122 | +```python |
| 123 | +import pandas as pd |
| 124 | +from feast import FeatureStore |
| 125 | + |
| 126 | +store = FeatureStore(repo_path="feature_repo/") |
| 127 | + |
| 128 | +labels_df = pd.DataFrame({ |
| 129 | + "interaction_id": ["int-001", "int-002"], |
| 130 | + "reward_label": ["positive", "negative"], |
| 131 | + "safety_score": [0.95, 0.12], |
| 132 | + "labeler": ["nemo_guardrails", "nemo_guardrails"], |
| 133 | + "event_timestamp": pd.to_datetime(["2025-01-15", "2025-01-15"]), |
| 134 | +}) |
| 135 | + |
| 136 | +store.push("label_push_source", labels_df) |
| 137 | +``` |
| 138 | + |
| 139 | +This writes the labels into both the online and offline stores, making them available for real-time serving and historical training dataset generation. |
0 commit comments