solrguard is a local-first search change governance toolkit for Apache Solr. The CLI orchestrates a staged workflow that:
- captures reproducible baseline metadata
- provisions a shadow collection/configset
- loads or samples documents
- loads or extracts queries
- replays baseline vs shadow
- computes ranking and non-ranking diffs
- emits machine-readable artifacts and a single-file HTML report
The design is additive. New feature tracks attach extra artifacts and report sections without changing the base replay/compare contract.
changeset + docs + queries
|
v
validate -> snapshot -> inspect -> preflight
|
v
shadow create -> docs sample/load -> index
|
v
queries extract/load -> replay -> compare
|
+--> rewrite diff
+--> explain capture
+--> vector/hybrid scenario replay
+--> performance capture
+--> root-cause analysis
+--> recommendations
+--> LTR impact
+--> optional plugin execution
|
v
report.json + report.html + run_manifest.json + plugins.json
flowchart LR
A["Interfaces: CLI / API / CI / Plugins"] --> B["Analysis Layer"]
B --> C["Governance Layer"]
C --> D["Runtime Integration Layer"]
D --> E["Delivery Layer"]
B --- B1["Replay/Compare"]
B --- B2["Compatibility + Capability Detection"]
B --- B3["Segmentation + Privacy Filters"]
C --- C1["Policies + Gates"]
C --- C2["Approvals + Exceptions"]
C --- C3["Promotion State + Audit"]
D --- D1["Security + Redaction"]
D --- D2["Observability + Webhooks"]
D --- D3["Rollout Orchestration"]
E --- E1["Artifacts + Reports"]
E --- E2["Docker/Helm/API Service"]
flowchart LR
A["Change Proposal"] --> B["Detect Solr Version/Capabilities"]
B --> C["Baseline vs Candidate Analysis"]
C --> D["Policy Evaluation"]
D --> E{"Pass?"}
E -->|"No"| F["Exception Required or Reject"]
E -->|"Yes"| G["Approval Metadata"]
F --> G
G --> H["Rollout and Rollback Plan"]
H --> I["Post-cutover Verification"]
I --> J["Audit + Export-safe Artifacts"]
changeset security/audit config
|
v
secret resolution (env/file/object refs)
|
v
auth material build (none/basic/bearer/mtls/plugin)
|
v
Solr HTTP clients (baseline/shadow)
|
+--> redaction engine (manifests/reports/API payload logs)
+--> audit trail writer (audit.json + API audit logs)
+--> privacy/retention enforcement (profile-driven artifact suppression)
flowchart LR
A["Git Configset"] --> C["Git vs Live Diff"]
B["Live Solr Cluster"] --> C
C --> D["Canary Plan"]
D --> E["Alias Swap Dry-run"]
E --> F["Policy and Approval Check"]
F --> G["Execute in Delivery System"]
G --> H["Post-cutover Verify"]
H --> I["Rollback Plan (if required)"]
schema_lens/cli.pyschema_lens/config.pyschema_lens/errors.pyschema_lens/api/
cli.py owns stage ordering, artifact paths, and run manifest updates. Feature packages expose
small assembler functions so orchestration stays thin. api/ exposes service-mode wrappers over
the same core workflow with queued run execution and artifact serving.
schema_lens/http/schema_lens/solr/schema_lens/shadow/
These modules isolate Solr HTTP concerns, retries, admin endpoints, schema APIs, configset handling, and collection lifecycle management.
schema_lens/changesets/schema_lens/data/schema_lens/queries/schema_lens/schema/schema_lens/snapshot/
These packages parse/validate changesets, sample documents, extract queries from files/logs, build schema dependency graphs, and capture deterministic baseline snapshots.
schema_lens/replay/schema_lens/compare/schema_lens/vector/schema_lens/compat/
replay executes lexical baseline/shadow requests. vector adds scenario-based replay and
client-side hybrid simulation. compare computes ranking, facet, filter, sort, rewrite, explain,
gate, and report-ready summaries. compat now includes typed version/capability models, a
version matrix, runtime endpoint probes, and adapters so optional features degrade cleanly across
Solr 8/9/10.
schema_lens/perf/schema_lens/rootcause/schema_lens/recommend/schema_lens/ltr/schema_lens/env_compare/schema_lens/monitor/schema_lens/plugins/schema_lens/security/schema_lens/observability/schema_lens/governance/schema_lens/rollout/schema_lens/segments/schema_lens/privacy/
These packages are optional, additive tracks:
perf: latency, cache, and index-footprint estimationrootcause: deterministic diagnosis rulesrecommend: action-oriented follow-ups from root causesltr: feature-log aware rerank driftenv_compare: cross-cluster driftmonitor: snapshot-vs-current drift historyplugins: optional extension SDK (contracts, registry, loader, compatibility checks)security: auth resolution, secret loading, redaction, audit trail, execution profilesobservability: run events, webhook sinks, Prometheus text export, OTel-style stage spansgovernance: approvals, policy bundles, exceptions, promotion state, optional manifest signingrollout: GitOps drift checks, canary plan generation, alias swap/rollback plans, post-cutover verificationsegments: multi-tenant/segment aggregation and segment-level policy checksprivacy: PII masking, export-safe transformations, retention pruning
schema_lens/report/schema_lens/dashboard/schema_lens/ci/
report builds JSON and HTML bundles. dashboard serves a read-only local UI over artifacts on
disk. ci formats PR-friendly markdown summaries.
docker/helm/solrguard/scripts/release/.github/workflows/release.yml
These assets support enterprise deployment paths (containerized CLI/API mode, Helm-managed service deployment, and release artifact generation).
Core run artifacts:
run_manifest.jsonsnapshot*.jsoncompat.jsonschema_risk.jsonshadow.jsonreplay.jsoncompare.jsonreport.jsonreport.html
Optional additive artifacts:
docs_sample.jsonlqueries_extracted.jsonlvector_validation.jsonhybrid_sensitivity.jsonperf_metrics.jsonrootcauses.jsonrecommendations.jsonenv_compare.jsonltr_impact.jsonplugins.jsonaudit.jsongovernance.jsonobservability_events.jsonlotel_spans.jsonwebhook_deliveries.jsonprometheus_metrics.promsegments.jsonprivacy.jsonlatest_monitor.jsonmonitor_history.jsonl
Missing optional capabilities must serialize as:
{"enabled": false, "reason": "..."}This keeps downstream report/dashboard code stable.
Plugin runtime is intentionally narrow:
- Discovery: built-in + local directories + Python entry points
- Contract check: metadata and version compatibility
- Lifecycle:
validate -> initialize -> (phase hooks) -> execute -> cleanup - Isolation: plugin failures are recorded in artifacts and only block runs in strict mode
Plugin boundary diagram:
changeset/plugins config
|
v
PluginLoader ----> PluginRegistry
| |
v v
PluginRuntime ----> phase hooks
(plugin_service) - query/doc source
| - auth/replay/analyze
v - gate/report/rollout
out/<run>/plugins/* - observability events
|
v
plugins.json + compare.json.plugins + report.json.plugin_report_sections
Core replay/compare logic stays in first-party packages; plugins are additive and optional.
- Existing commands stay valid.
- Existing artifact keys are never removed in-place.
- New sections are additive only.
- Feature packages must tolerate partial artifacts and missing Solr capabilities.
- Deterministic logic is preferred over opaque inference.
- Fast unit tests cover:
- parser/validator logic
- diff metrics
- root-cause and recommendation rules
- performance summarization
- env compare/auth helpers
- monitor history and drift math
- LTR feature parsing
- Docker integration tests cover Solr-dependent behavior.
- Smoke targets (
make smoke,make smoke-vector,make smoke-matrix) validate end-to-end slices against the bundled SolrCloud example.