feat(clp-s): Support count and count-by-time aggregation output to stdout.#2342
feat(clp-s): Support count and count-by-time aggregation output to stdout.#2342davemarco wants to merge 27 commits into
Conversation
…e results cache. Allows --count and --count-by-time to be used with the results-cache output handler, writing aggregation results directly to MongoDB via _id-keyed atomic $inc upserts so that partial results from multiple search processes merge correctly without a reducer.
…he search usage text outputs to the reducer.
…nation and simplify comments.
…dout. Adds `--count` and `--count-by-time` to the stdout output handler, emitting per-archive results as newline-delimited JSON. The options may be given without naming the `stdout` handler (e.g. `clp-s s <archives> <query> --count`), and the kv-ir search path now reports these aggregations as unsupported, consistent with the reducer and results-cache handlers.
WalkthroughSearch aggregation options are centralized in ChangesSearch aggregation plumbing
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…ers. - Take string_view for the results-cache handler uri/collection/dataset/archive_id params and materialize std::string only at the mongocxx call sites. - Drop the redundant ::clp_s:: qualifier on the count handlers' base class. - Rename the count-by-time write timestamp param to timestamp_ms to match the bucket size unit.
…n handler. Merges the results-cache count work and applies the same review fixes to AggregationToStdoutOutputHandler: take string_view for archive_id, drop the redundant ::clp_s:: qualifier on the base class, rename the count-by-time bucket size to count_by_time_bucket_size_ms, and rename the count-by-time write timestamp param to timestamp_ms.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@components/core/src/clp_s/OutputHandlerImpl.cpp`:
- Around line 240-248: CountStdoutOutputHandler::finish is returning early when
m_count is zero, which skips emitting the archive JSON for --count. Update the
finish method so it always writes one JSON object per archive, including the
archive_id and a count of 0, and only uses the success return after emitting the
result; keep the existing JSON construction in CountStdoutOutputHandler::finish
and remove the zero-count early exit.
In `@components/core/src/clp_s/OutputHandlerImpl.hpp`:
- Around line 437-445: The bucket calculation in OutputHandlerImpl::write is
using truncating division, so negative timestamps are assigned to the wrong time
bucket. Update the bucket derivation to use floor-style bucketing for
epochtime_t values, and make the same change in the sibling count-by-time
handlers so their bucket logic stays consistent with
m_count_by_time_bucket_size_ms.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 2441ee68-0def-417a-aa79-bae16df62ca7
📒 Files selected for processing (6)
components/core/src/clp_s/CommandLineArguments.cppcomponents/core/src/clp_s/CommandLineArguments.hppcomponents/core/src/clp_s/OutputHandlerImpl.cppcomponents/core/src/clp_s/OutputHandlerImpl.hppcomponents/core/src/clp_s/clp-s.cppcomponents/core/src/clp_s/kv_ir_search.cpp
davidlion
left a comment
There was a problem hiding this comment.
Very minor comments. We're currently in discussion about the best way to go about de-duplicating the aggregator x output-handler logic.
| search_options.add(match_options); | ||
|
|
||
| po::options_description aggregation_options("Aggregation Controls"); | ||
| // clang-format off |
There was a problem hiding this comment.
it does look slightly worse without it. like it wraps the fields and descriptions on the same line. in my opinion
it reads nicer without the auto wrapping.
…tion_for_handler and a <clp_s/...> include in OutputHandlerImpl.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
components/core/src/clp_s/OutputHandlerImpl.hpp (1)
392-414: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick winEmit zero-count archives from
finish().The paired
CountStdoutOutputHandler::finish()implementation currently returns early whenm_count == 0, so searched archives with zero matches produce no JSON at all. That breaks the advertised--countcontract of one object per archive and makes “0 matches” indistinguishable from “archive was never searched”.Proposed fix in
components/core/src/clp_s/OutputHandlerImpl.cppauto CountStdoutOutputHandler::finish() -> ErrorCode { - if (0 == m_count) { - return ErrorCode::ErrorCodeSuccess; - } - nlohmann::json result; result[constants::results_cache::search::cArchiveId] = m_archive_id; result[constants::results_cache::search::cCount] = m_count; std::cout << result.dump() << '\n'; return ErrorCode::ErrorCodeSuccess; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@components/core/src/clp_s/OutputHandlerImpl.hpp` around lines 392 - 414, CountStdoutOutputHandler::finish() currently skips emitting output when m_count is zero, so zero-match archives never produce a JSON record. Update the finish() implementation for CountStdoutOutputHandler to always emit one count object per archive_id, even when the count is 0, and then return success. Keep the behavior consistent with the write(std::string_view) counter logic and use the existing CountStdoutOutputHandler::finish() path to locate the fix.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@components/core/src/clp_s/OutputHandlerImpl.hpp`:
- Around line 392-414: CountStdoutOutputHandler::finish() currently skips
emitting output when m_count is zero, so zero-match archives never produce a
JSON record. Update the finish() implementation for CountStdoutOutputHandler to
always emit one count object per archive_id, even when the count is 0, and then
return success. Keep the behavior consistent with the write(std::string_view)
counter logic and use the existing CountStdoutOutputHandler::finish() path to
locate the fix.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 1d262b50-96a3-41c5-9319-b759bb79f23c
📒 Files selected for processing (3)
components/core/src/clp_s/CommandLineArguments.cppcomponents/core/src/clp_s/CommandLineArguments.hppcomponents/core/src/clp_s/OutputHandlerImpl.hpp
Description
Adds --count and --count-by-time aggregations to the clp-s search stdout output handler. Results are emitted per archive as newline-delimited JSON.
--count writes a single object per archive:
clp-s s archives-dir 'level: ERROR' --count{"archive_id":"<id>","count":42}--count-by-time writes one object per time bucket:
clp-s s archives-dir 'level: ERROR' --count-by-time 1000{"archive_id":"<id>","timestamp":1700000000000,"count":7}Since stdout is the default handler, the flags can be used without naming it (e.g. clp-s s --count).
Checklist
breaking change.
Validation performed
On MongoDb dataset, --count matched grep across several queries, and --count-by-time bucket counts summed to the total as well
Summary by CodeRabbit