Skip to content

Commit 67de4e9

Browse files
committed
Layer extension: register custom layers without modifying OAP source
`Layer` becomes a registry-backed value type with the same public API as the prior enum (`Layer.SERVICE`, `value()`, `name()`, `valueOf(int)`, `valueOf(String)`, `nameOf(String)`, `values()`). Built-in layers stay as `public static final Layer` constants with frozen ordinals 0–49. External layers can be registered through any of three new paths, all funneling through `Layer.register(name, ordinal, normal)`: - `layer-extensions.yml` on the OAP classpath / config dir — operator config; one line per layer, no code change required. - `LayerExtension` Java SPI under `META-INF/services/` — for plugin jars. - Inline `layerDefinitions:` block at the top of any MAL or LAL rule file — the cleanest path when a layer ships together with the rules that produce its telemetry. Storage encoding is unchanged: layers persist by ordinal int in BanyanDB, Elasticsearch, and JDBC. Built-in ordinals 0–49 stay frozen; 50–999 are reserved by convention for future built-ins; external layers are recommended to start at >= 1000. The recommendation is informational — collisions are reported loudly at boot via the ordinal-uniqueness check. The registry is sealed at the start of `Core.notifyAfterCompleted()` after every module's prepare/start has run. Subsequent `Layer.register` calls throw, so runtime-rule MAL/LAL `/addOrUpdate` requests now reject inline `layerDefinitions:` with a clear error pointing operators at the boot-time registration paths. `UITemplateInitializer` no longer iterates `Layer.values()` for folder discovery; it walks `ui-initialized-templates/**/*.json` recursively and trusts each template's own `configuration.layer` field, so dashboards for external layers are auto-discovered without code changes. `Layer.values()` now throws before seal — pre-seal the registry may still grow, and a partial snapshot would silently mislead callers. The cached sorted snapshot is computed once at seal so post-seal calls are O(1) plus an array clone. BanyanDB `MetadataRegistry.parseTagSpec` previously relied on `Class.isEnum()` to map Layer columns to TAG_TYPE_INT; with `Layer` no longer an enum, that path now treats `Layer.class` explicitly, matching the existing ES / JDBC handling.
1 parent dae1553 commit 67de4e9

19 files changed

Lines changed: 783 additions & 245 deletions

File tree

.claude/skills/new-monitoring-feature/SKILL.md

Lines changed: 21 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -21,11 +21,22 @@ There is one skill per narrow concern. This one is the wiring map.
2121

2222
## 0. Register the `Layer` — the feature's entry point
2323

24-
A `Layer` is how OAP slices services / instances / endpoints by data source. **Every new feature needs a new `Layer` enum value.** The UI, storage partitioning, menu navigation, and OAL aggregation all key off it.
24+
A `Layer` is how OAP slices services / instances / endpoints by data source. The UI, storage partitioning, menu navigation, and OAL aggregation all key off it.
2525

26-
**Only one place to edit**`oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/Layer.java`. Add a new enum constant with a unique id and `normal` flag. Ids are never reused; pick the next integer. Examples: `IOS(47, true)`, `APISIX(27, true)`, `VIRTUAL_DATABASE(11, false)` for inferred/non-real services.
26+
`Layer` is a registry-backed value type (no longer a closed enum). Built-in layers are declared as `public static final Layer` constants in `oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/Layer.java`; external layers are registered through the `Layer.register(name, ordinal, normal)` API at boot. **Pick the registration path that matches the scope of your feature:**
2727

28-
UI template folders are auto-discovered: `UITemplateInitializer.UI_TEMPLATE_FOLDER` is computed from `Layer.values()` + `"custom"` at class-init time. Drop a `ui-initialized-templates/<layer-name-lowercased>/` folder on disk and the initializer picks it up on the next boot. Missing folders are silently skipped. There is no allowlist to append to.
28+
| Your feature ships as | Registration path |
29+
|---|---|
30+
| Part of the OAP distribution (in-tree, the common case for new SkyWalking-supported targets) | Add a `public static final Layer` constant to `Layer.java` with the next sequential ordinal in `0–49`. Examples: `IOS = register("IOS", 47, true)`, `APISIX = register("APISIX", 21, true)`, `VIRTUAL_DATABASE = register("VIRTUAL_DATABASE", 14, false)` for inferred/non-real services. |
31+
| An out-of-tree MAL or LAL rule file | Add a top-level `layerDefinitions:` block to the rule file. The DSL loader funnels each entry through `Layer.register` before compiling the rule. One file ships the layer + the rules that produce its telemetry. |
32+
| An out-of-tree plugin module (jar) | Implement `org.apache.skywalking.oap.server.core.analysis.LayerExtension` and register via `META-INF/services/`. Discovered by `LayerExtensionLoader` during `CoreModuleProvider.prepare()`. |
33+
| Operator-deployed config (no code, no DSL) | Add an entry to `oap-server/server-starter/src/main/resources/layer-extensions.yml` (or override on the OAP node's classpath). |
34+
35+
**Ordinal conventions:** `0–49` is in active use by built-ins. `50–999` is reserved by convention for future built-in layers. External layers are recommended (not required) to start at `>= 1000` to avoid colliding with future built-ins on OAP upgrade. Collisions in either direction are detected at boot via the ordinal-uniqueness check, which fails OAP startup loudly.
36+
37+
**Storage encoding is the ordinal int**, persisted in BanyanDB / Elasticsearch / JDBC. Every OAP node that reads or writes a given layer must agree on its `(name, ordinal)` mapping — deploy `layer-extensions.yml` and any `layerDefinitions:` rule files identically across all nodes. The registry is sealed at the start of `Core.notifyAfterCompleted()`; later registration attempts throw.
38+
39+
**UI template folders are auto-discovered by file scan, not by `Layer.values()`.** `UITemplateInitializer` walks `ui-initialized-templates/**/*.json` recursively (depth 2) and trusts each template's own `configuration.layer` field. Drop a folder of dashboard JSONs on disk and the initializer picks them up on the next boot — folder name is purely organizational.
2940

3041
**Component ID lookup in Java code**: IDs declared in `component-libraries.yml` are loaded at runtime into `ComponentLibraryCatalogService`'s `componentName2Id` map — they are **not** exposed as Java enum constants. To look up by name in listener code, inject the catalog service and resolve once at construction:
3142
```java
@@ -35,11 +46,14 @@ int myComponentId = catalog.getComponentId("My-Component-Name");
3546
```
3647
Cache as an `int` field; runtime comparisons are then plain `componentId == myComponentId`. **Trap:** there is a `ComponentsDefine` class under `skywalking-trace-receiver-plugin/src/test/java/.../mock/ComponentsDefine.java` — it is a test-only mock holding five hand-picked constants (Tomcat, Dubbo, RocketMQ, MongoDB). Do not import or extend it from production code.
3748

38-
Emit the layer from every source object your feature produces:
49+
Emit the layer from every source object your feature produces. Built-in layers have a static-field accessor; external layers are looked up by name through the registry:
50+
3951
```java
40-
service.setLayer(Layer.<YOUR_LAYER>);
41-
serviceInstance.setServiceLayer(Layer.<YOUR_LAYER>);
42-
endpoint.setServiceLayer(Layer.<YOUR_LAYER>);
52+
// Built-in layer (constant)
53+
service.setLayer(Layer.IOS);
54+
55+
// External layer (registered via yaml / SPI / layerDefinitions:)
56+
service.setLayer(Layer.nameOf("IOT_FLEET"));
4357
```
4458

4559
Downstream (the core OAL, `service ly <LAYER>` swctl query, topology filters, UI root dashboard's layer selector) all work off this single enum value.

docs/en/changes/changes.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,7 @@
9797
* MAL: add `safeDiv(divisor)` on `SampleFamily` that yields `0` when the divisor is `0` instead of `Infinity`/`NaN`. Replace `/` with `safeDiv(...)` in Envoy AI Gateway latency-average rules so `sum / count * 1000` no longer produces dropped or out-of-range samples when a counter is zero in a window.
9898
* Fix: `envoy-ai-gateway` metrics rules, make the metrics value return `0` when the divisor is `0`.
9999
* Fix: LAL compiler treated `(tag("x") as Integer) + (tag("y") as Integer)` as string concatenation instead of numeric addition. Expressions like `input_tokens + output_tokens < 10000` produced the concatenated string `"2589115"` rather than the integer sum `2704`, so token-threshold conditions never triggered `abort {}`. The compiler now detects all-numeric operands (cast to `Integer` or `Long`) and emits proper `long` arithmetic.
100+
* Custom `Layer`s can be declared without modifying the OAP source — via an operator-managed `layer-extensions.yml`, inline `layerDefinitions:` block in a MAL or LAL rule file, or a plugin extension. UI dashboard templates for new layers are auto-discovered from the `ui-initialized-templates/` directory. Recommended ordinal range for external layers is `>= 1000`; conflicting names or ordinals are reported at boot.
100101

101102
#### UI
102103
* Add mobile menu icon and i18n labels for the iOS layer.

docs/en/concepts-and-designs/lal.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,51 @@ Use `tag 'key': sourceAttribute("attr")` in the extractor to selectively persist
3030
Layer should be declared in the LAL script to represent the analysis scope of the logs.
3131
LAL rules are routed by layer — only rules matching the incoming log's layer are evaluated.
3232

33+
### Inline layer declarations (`layerDefinitions:`)
34+
35+
A LAL file may declare its own custom layers with a top-level `layerDefinitions:` block.
36+
Each entry is funneled through `Layer.register(name, ordinal, normal)` **before**
37+
the rules in the same file compile, so a LAL file is fully self-describing — a new
38+
monitoring target can land as a single LAL file without an enum edit elsewhere in the OAP
39+
source.
40+
41+
```yaml
42+
layerDefinitions:
43+
- name: IOT_FLEET # upper-snake-case, must match [A-Z][A-Z0-9_]*
44+
ordinal: 1000 # unique across all layers; >= 1000 recommended
45+
normal: true # true = agent-installed (default), false = conjectured/virtual
46+
47+
rules:
48+
- name: iot-fleet-access
49+
layer: IOT_FLEET
50+
dsl: |
51+
filter {
52+
text { regexp $/(?<status>\d+)\s+(?<path>\S+)/$ }
53+
sink { sampler { rateLimit { rpm 1800 } } }
54+
}
55+
```
56+
57+
Notes:
58+
- **Storage encoding is the ordinal int**, persisted in BanyanDB / Elasticsearch / JDBC.
59+
Every OAP node that reads or writes a given layer must agree on its `(name, ordinal)`
60+
mapping — deploy a LAL file with `layerDefinitions:` identically across all nodes.
61+
- **Identical re-registration is a no-op**, so the same `IOT_FLEET` entry can appear in
62+
multiple LAL files (and additionally in a MAL file, in `layer-extensions.yml`, or via the
63+
`LayerExtension` SPI). Conflicting registrations cause OAP boot to fail loudly with the
64+
offending file in the stack trace.
65+
- **Ordinals 0–49** are in active use by the OAP distribution's built-in layers; **50–999**
66+
are reserved by convention for future built-ins. External layers should start at `>= 1000`
67+
— enforcement is not strict, but staying above the reserved band avoids upgrade-time
68+
collisions.
69+
- `layer: auto` works with extension layers too — the extractor body can call
70+
`layer "IOT_FLEET"` and the runtime resolves it through the registry.
71+
72+
Three other registration paths exist for layers that are **not** specific to a LAL file: an
73+
operator-managed `layer-extensions.yml`, a `LayerExtension` Java SPI for plugin jars, and
74+
the built-in static fields in `Layer.java` for distribution layers. See
75+
[`Layer.java`](../../../oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/Layer.java)
76+
javadoc for the full picture.
77+
3378
When `layer: auto` is declared, the rule matches logs where `service.layer` is absent (common for OTLP
3479
sources that don't set this attribute). The script is expected to set the layer in the extractor:
3580

docs/en/concepts-and-designs/mal.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -377,6 +377,44 @@ name: <string>
377377
exp: <string>
378378
```
379379
380+
### <layer_definitions>
381+
382+
Optional top-level block for declaring custom layers inline alongside the rules that produce
383+
their telemetry. Each entry is funneled through `Layer.register(name, ordinal, normal)`
384+
**before** the rules in the same file compile, so a MAL file is fully self-describing — a new
385+
monitoring target can land as a single MAL file without an enum edit elsewhere in the OAP source.
386+
387+
```yaml
388+
layerDefinitions:
389+
- name: IOT_FLEET # upper-snake-case, must match [A-Z][A-Z0-9_]*
390+
ordinal: 1000 # unique across all layers; >= 1000 recommended
391+
normal: true # true = agent-installed (default), false = conjectured/virtual
392+
393+
metricsRules:
394+
- name: device_battery_percentage
395+
exp: iot_device_battery_level.tagAverage(['service'], ['host'])
396+
expSuffix: instance(['host'], ['service'], Layer.nameOf('IOT_FLEET'))
397+
```
398+
399+
Notes:
400+
- **Storage encoding is the ordinal int**, persisted in BanyanDB / Elasticsearch / JDBC. Every
401+
OAP node that reads or writes a given layer must agree on its `(name, ordinal)` mapping —
402+
deploy a MAL file with `layerDefinitions:` identically across all nodes.
403+
- **Identical re-registration is a no-op**, so the same `IOT_FLEET` entry can appear in multiple
404+
MAL files (and additionally in a LAL file, in `layer-extensions.yml`, or via the
405+
`LayerExtension` SPI). Conflicting registrations (same name with different ordinal, or same
406+
ordinal with different name) cause OAP boot to fail loudly with the offending file in the
407+
stack trace.
408+
- **Ordinals 0–49** are in active use by the OAP distribution's built-in layers; **50–999** are
409+
reserved by convention for future built-ins. External layers should start at `>= 1000` —
410+
enforcement is not strict, but staying above the reserved band avoids upgrade-time collisions.
411+
412+
Three other registration paths exist for layers that are **not** specific to a MAL file: an
413+
operator-managed `layer-extensions.yml`, a `LayerExtension` Java SPI for plugin jars, and the
414+
built-in static fields in `Layer.java` for distribution layers. See
415+
[`Layer.java`](../../../oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/Layer.java)
416+
javadoc for the full picture.
417+
380418
## More Examples
381419

382420
Please refer to [OAP Self-Observability](../../../oap-server/server-starter/src/main/resources/otel-rules/oap.yaml).

oap-server/analyzer/log-analyzer/src/main/java/org/apache/skywalking/oap/log/analyzer/v2/provider/LALConfigs.java

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,8 @@
3333
import java.util.Map;
3434
import lombok.Data;
3535
import lombok.extern.slf4j.Slf4j;
36+
import org.apache.skywalking.oap.server.core.UnexpectedException;
37+
import org.apache.skywalking.oap.server.core.analysis.LayerDefinition;
3638
import org.apache.skywalking.oap.server.core.rule.ext.RuleSetMerger;
3739
import org.apache.skywalking.oap.server.library.module.ModuleManager;
3840
import org.apache.skywalking.oap.server.library.module.ModuleStartException;
@@ -48,6 +50,12 @@
4850
@Slf4j
4951
public class LALConfigs {
5052
private List<LALConfig> rules;
53+
/**
54+
* Optional inline layer registrations. When present, each entry is registered through
55+
* {@code Layer.register(...)} before the rules in this file are compiled, so a
56+
* LAL file is self-describing for any custom layers it references.
57+
*/
58+
private List<LayerDefinition> layerDefinitions;
5159

5260
public static List<LALConfigs> load(final String path, final List<String> files) throws Exception {
5361
return loadInternal(path, files, null, /* useInstalledManager= */ true);
@@ -127,6 +135,7 @@ private static List<LALConfigs> loadInternal(final String path, final List<Strin
127135
if (configs == null || configs.getRules() == null) {
128136
continue;
129137
}
138+
registerInlineLayers(ruleName, configs);
130139
// sourceFileName is only present for entries that came from disk; resolver-
131140
// only rules synthesise a name so diagnostics still print something.
132141
final String src = sourceFileName.getOrDefault(ruleName, ruleName + ".yaml");
@@ -141,4 +150,25 @@ private static List<LALConfigs> loadInternal(final String path, final List<Strin
141150
throw new ModuleStartException("Failed to load LAL config rules", e);
142151
}
143152
}
153+
154+
/**
155+
* Funnel any inline {@code layerDefinitions:} entries through {@code Layer.register}.
156+
* Conflict checks (reserved-range, name uniqueness, ordinal uniqueness, sealed-state) live
157+
* in {@code Layer.register}; failures here surface with the offending rule name in
158+
* the stack trace, which is enough for an operator to find the bad file.
159+
*/
160+
private static void registerInlineLayers(final String ruleName, final LALConfigs configs) {
161+
final List<LayerDefinition> defs = configs.getLayerDefinitions();
162+
if (defs == null || defs.isEmpty()) {
163+
return;
164+
}
165+
for (final LayerDefinition def : defs) {
166+
try {
167+
def.register();
168+
} catch (RuntimeException e) {
169+
throw new UnexpectedException(
170+
"LAL rule " + ruleName + " layerDefinitions entry rejected: " + def, e);
171+
}
172+
}
173+
}
144174
}

oap-server/analyzer/meter-analyzer/src/main/java/org/apache/skywalking/oap/meter/analyzer/v2/prometheus/rule/Rule.java

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
import lombok.Data;
2222
import lombok.NoArgsConstructor;
2323
import org.apache.skywalking.oap.meter.analyzer.v2.MetricRuleConfig;
24+
import org.apache.skywalking.oap.server.core.analysis.LayerDefinition;
2425

2526
import java.util.List;
2627

@@ -36,6 +37,12 @@ public class Rule implements MetricRuleConfig {
3637
private String expPrefix;
3738
private String filter;
3839
private List<MetricsRule> metricsRules;
40+
/**
41+
* Optional inline layer registrations. When present, each entry is registered through
42+
* {@code Layer.register(...)} before the rule's expressions compile, so the
43+
* rule file is self-describing for any custom layers it references.
44+
*/
45+
private List<LayerDefinition> layerDefinitions;
3946

4047
@Override
4148
public String getSourceName() {

oap-server/analyzer/meter-analyzer/src/main/java/org/apache/skywalking/oap/meter/analyzer/v2/prometheus/rule/Rules.java

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@
3737
import java.util.stream.Stream;
3838

3939
import org.apache.skywalking.oap.server.core.UnexpectedException;
40+
import org.apache.skywalking.oap.server.core.analysis.LayerDefinition;
4041
import org.apache.skywalking.oap.server.core.rule.ext.RuleSetMerger;
4142
import org.apache.skywalking.oap.server.library.module.ModuleManager;
4243
import org.apache.skywalking.oap.server.library.util.ResourceUtils;
@@ -149,9 +150,31 @@ private static Rule parseRule(final String ruleName, final byte[] bytes) {
149150
return null;
150151
}
151152
rule.setName(ruleName);
153+
registerInlineLayers(ruleName, rule);
152154
return rule;
153155
} catch (IOException e) {
154156
throw new UnexpectedException("Load rule " + ruleName + " failed", e);
155157
}
156158
}
159+
160+
/**
161+
* Funnel any inline {@code layerDefinitions:} entries through {@code Layer.register}.
162+
* Conflict checks (reserved-range, name uniqueness, ordinal uniqueness, sealed-state) live
163+
* in {@code Layer.register}; failures here surface with the offending rule name in
164+
* the stack trace, which is enough for an operator to find the bad file.
165+
*/
166+
private static void registerInlineLayers(final String ruleName, final Rule rule) {
167+
final List<LayerDefinition> defs = rule.getLayerDefinitions();
168+
if (defs == null || defs.isEmpty()) {
169+
return;
170+
}
171+
for (final LayerDefinition def : defs) {
172+
try {
173+
def.register();
174+
} catch (RuntimeException e) {
175+
throw new UnexpectedException(
176+
"MAL rule " + ruleName + " layerDefinitions entry rejected: " + def, e);
177+
}
178+
}
179+
}
157180
}

oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/CoreModuleProvider.java

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,8 @@
2525
import org.apache.skywalking.oap.server.core.analysis.ApdexThresholdConfig;
2626
import org.apache.skywalking.oap.server.core.rule.ext.RuleSetMerger;
2727
import org.apache.skywalking.oap.server.core.analysis.DisableRegister;
28+
import org.apache.skywalking.oap.server.core.analysis.Layer;
29+
import org.apache.skywalking.oap.server.core.analysis.LayerExtensionLoader;
2830
import org.apache.skywalking.oap.server.core.analysis.StreamAnnotationListener;
2931
import org.apache.skywalking.oap.server.core.analysis.meter.MeterEntity;
3032
import org.apache.skywalking.oap.server.core.analysis.meter.MeterSystem;
@@ -184,6 +186,12 @@ public void onInitialized(final CoreModuleConfig initialized) {
184186

185187
@Override
186188
public void prepare() throws ServiceNotProvidedException, ModuleStartException {
189+
// Load external Layer registrations (yaml + SPI) before anything else in Core
190+
// wires up. Downstream modules' prepare()/start() — including MAL/LAL DSL loaders
191+
// that may declare inline `layerDefinitions:` blocks — register on top of this
192+
// baseline. The registry is sealed at the start of notifyAfterCompleted().
193+
LayerExtensionLoader.load();
194+
187195
if (moduleConfig.isActiveExtraModelColumns()) {
188196
DefaultScopeDefine.activeExtraModelColumns();
189197
}
@@ -473,6 +481,12 @@ public void start() throws ModuleStartException {
473481

474482
@Override
475483
public void notifyAfterCompleted() throws ModuleStartException {
484+
// Seal the Layer registry: every module's prepare() and start() has now run, so
485+
// MAL/LAL/SPI/yaml all had their full window to register external layers. After
486+
// this point, Layer.register() throws — UI templates and downstream
487+
// queries see a stable layer set.
488+
Layer.seal();
489+
476490
try {
477491
if (!RunningMode.isInitMode()) {
478492
grpcServer.start();

0 commit comments

Comments
 (0)