solrguard is a local-first search change governance toolkit for Apache Solr. It helps you answer one question before you
ship a schema or query-default change:
"What will this change do to ranking, parser behavior, facets, latency, and rollout risk?"
This guide is the operator-focused version of the README. It is organized around what you want to do, not around internal package layout.
If you are new to the tool, use this order:
- Run the basic smoke path.
- Open
report.html. - Inspect
compare.json. - Add only the extra tracks you need: rewrite, vector, performance, env compare, monitor.
Fastest first run:
make dev-up
make demo-setup
.venv/bin/solrguard run examples/changesets/fieldtype-change.yaml --out out/demo
open out/demo/report.htmlAt a high level, solrguard:
- captures baseline collection metadata
- creates a shadow collection
- applies your planned schema/config/query-default change
- indexes representative documents into shadow
- replays representative queries against baseline and shadow
- computes diffs and optional analysis tracks
- emits reproducible JSON and HTML artifacts
Core outputs:
report.html: easiest human review artifactreport.json: structured report bundle for dashboards or automationcompare.json: ranking/facet/filter/sort comparison payloadrun_manifest.json: exact input/settings manifest for reproducibility
Use this table as the shortest path to the right command.
| Goal | Command | Primary output |
|---|---|---|
| Validate a changeset before running | solrguard validate <changeset> |
terminal validation result |
| Inspect a live collection | solrguard inspect --solr-url ... --collection ... --out inspect.json |
inspect.json |
| Capture a reproducible baseline snapshot | solrguard snapshot --solr-url ... --collection ... --out out/snapshot |
snapshot.json bundle |
| Run full baseline vs shadow evaluation | solrguard run <changeset> --out out/run |
report.html, report.json, compare.json |
| Replay only, without full run orchestration | solrguard replay ... --out replay.json |
replay.json |
| Compare an existing replay payload | solrguard compare --replay replay.json --out compare.json |
compare.json |
| Generate reports from existing compare data | solrguard report --compare compare.json --manifest run_manifest.json --out out/report |
report.json, report.html |
| Apply rollout policy thresholds | solrguard gate --compare compare.json --policy policy.yaml |
exit code + terminal summary |
| Produce CI markdown summary | solrguard ci summarize --compare compare.json --out summary.md |
summary.md |
| Compare two live environments | solrguard compare-env --env1 ... --env2 ... --queries ... --out out/env_compare |
env_compare.json, report.html |
| Generate recommendations from an existing run | solrguard recommend --run out/run --out recommendations.json |
recommendations.json |
| Serve a read-only artifact dashboard | solrguard serve --run out/run --port 8080 |
local dashboard |
| Append drift history from a baseline run/snapshot | solrguard monitor --baseline-snapshot out/run --queries ... --out out/monitor |
latest_monitor.json, monitor_history.jsonl |
This is the default reason to use the tool.
It measures:
- top-K overlap
- Jaccard
- Kendall tau
- moved, dropped, and newly introduced documents
numFounddeltas- facet-count deltas
- sort instability
Use it when:
- changing a field type
- changing analyzers
- changing query defaults
- validating a patch before rollout
This supports:
schema.synonym.updateschema.stopwords.update
Use it when:
- you want to patch
synonyms.txtorstopwords.txt - you want a realistic shadow configset instead of only in-memory parameter changes
Best example:
examples/changesets/procurement-synonym-rewrite.yaml
This captures parser behavior changes such as:
- clause explosions
- added/removed terms
- synonym expansion changes
- parsed query shape drift
Use it when:
- synonym changes are risky
mm,qf, or parser defaults changed- you need evidence beyond ranking movement alone
Best example:
examples/changesets/procurement-synonym-rewrite.yaml
Key output:
compare.json->rewrite_diff
This adds:
- lexical-only scenarios
- vector-only scenarios
- hybrid scenarios
- vector schema validation
- hybrid contribution estimates
- weight sensitivity sweeps
Use it when:
- evaluating a new embedding field
- testing hybrid lexical/vector blends
- comparing sensitivity to hybrid weights
Best example:
examples/changesets/vector-hybrid-demo.yaml
Key outputs:
vector_validation.jsonreplay_<scenario>.jsoncompare.json->vector_hybridhybrid_sensitivity.json
This estimates:
- latency regressions
QTimeregressions- cache churn
- index size effects
Use it when:
- a change may alter latency or cache behavior
- you want policy gates on performance, not only relevance
Best example:
examples/changesets/perf_estimator_example.yamlexamples/policy/perf_gate_default.yaml
Key output:
perf_metrics.json
This layer converts comparison evidence into deterministic findings and next steps.
Examples:
- prefix matching removed
- title boost reduced
- minimum-should-match became stricter
- vector dominance increased
- cache or latency regression
Use it when:
- you want faster triage after a failing run
- you want actionable hints for the next iteration
Key outputs:
rootcauses.jsonrecommendations.json
This compares two live Solr environments without creating a shadow collection.
Use it when:
- staging and production are drifting
- two regions behave differently
- you need live-vs-live comparison rather than planned change simulation
Best example:
.venv/bin/solrguard compare-env \
--env1 examples/envs/prod_us.yaml \
--env2 examples/envs/prod_eu.yaml \
--queries examples/queries/env_compare_queries.jsonl \
--out out/env_compareKey outputs:
env_compare.jsonreport.html
serve lets you inspect prior artifacts in a lightweight local dashboard.
monitor lets you append drift summaries over time:
latest_monitor.jsonmonitor_history.jsonl
Use these when:
- you want read-only artifact browsing
- you want to track drift after a baseline run
make dev-up
make demo-setup
.venv/bin/solrguard run examples/changesets/fieldtype-change.yaml --out out/demo
open out/demo/report.htmlmake dev-up
make demo-setup-procurement
.venv/bin/solrguard run examples/changesets/procurement-synonym-rewrite.yaml --out out/procurement_demo
open out/procurement_demo/report.htmlmake dev-up
make demo-setup-vector
.venv/bin/solrguard run examples/changesets/vector-hybrid-demo.yaml --out out/vector_demo --enable-sensitivity
open out/vector_demo/report.htmlmake dev-up
make demo-setup
.venv/bin/solrguard run examples/changesets/perf_estimator_example.yaml --out out/perf_demo
.venv/bin/solrguard gate --compare out/perf_demo/compare.json --policy examples/policy/perf_gate_default.yaml.venv/bin/solrguard compare-env \
--env1 examples/envs/prod_us.yaml \
--env2 examples/envs/prod_eu.yaml \
--queries examples/queries/env_compare_queries.jsonl \
--out out/env_compare
open out/env_compare/report.htmlOpen these in order:
report.htmlreport.jsoncompare.json
What to look for:
summary: overall overlap and high-risk ratetop_regressions: fastest way to find damaging queriesrewrite_diff: parser-level behavior changevector_hybrid: lexical vs vector scenario comparisonhybrid_sensitivity: how fragile the chosen weights areperformance: latency/cache/index impactroot_causes: deterministic diagnosisrecommendations: suggested next changes
For documents:
- use
data.docs_source.type=filefor reproducible local tests - use
data.docs_source.type=solrwhen you need a realistic sample from a live collection
For queries:
- use
queries.source.type=filefor controlled benchmarks - use
queries.source.type=logfor production realism
General rule:
- if you are debugging behavior, keep inputs small and reproducible
- if you are deciding rollout risk, use realistic docs and realistic queries
Dedicated Docker-backed integration coverage currently exists for:
- base
runsmoke path - rewrite-diff smoke path
- vector/hybrid smoke path
- smoke-matrix target
- environment compare smoke path
- monitor smoke path
- serve dashboard smoke path
Files:
tests/integration/test_run_smoke.pytests/integration/test_rewrite_diff_smoke.pytests/integration/test_vector_hybrid_smoke.pytests/integration/test_smoke_matrix.pytests/integration/test_ops_commands_smoke.py
This means the core workflows are exercised end-to-end.
Not every newer feature has its own dedicated end-to-end smoke yet. The following are implemented, documented, and unit-tested, but should be treated as not yet fully end-to-end covered by a feature-specific Docker smoke:
- performance analysis
- root-cause analysis
- recommendations
- LTR analysis
That is good enough for continued development, but not the same as saying every feature has full integration coverage.
For production-like use, this is the safest order:
validatesnapshotrun- inspect
report.html - apply
gate - optionally use
recommend,serve,compare-env, ormonitor