Skip to content

feat(clp-s): Support count and count-by-time aggregation output to stdout.#2342

Open
davemarco wants to merge 27 commits into
y-scope:mainfrom
davemarco:stdout
Open

feat(clp-s): Support count and count-by-time aggregation output to stdout.#2342
davemarco wants to merge 27 commits into
y-scope:mainfrom
davemarco:stdout

Conversation

@davemarco

@davemarco davemarco commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Description

Adds --count and --count-by-time aggregations to the clp-s search stdout output handler. Results are emitted per archive as newline-delimited JSON.

--count writes a single object per archive:

clp-s s archives-dir 'level: ERROR' --count
{"archive_id":"<id>","count":42}

--count-by-time writes one object per time bucket:

clp-s s archives-dir 'level: ERROR' --count-by-time 1000
{"archive_id":"<id>","timestamp":1700000000000,"count":7}

Since stdout is the default handler, the flags can be used without naming it (e.g. clp-s s --count).

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

On MongoDb dataset, --count matched grep across several queries, and --count-by-time bucket counts summed to the total as well

Summary by CodeRabbit

  • New Features
    • Added centralized search aggregation controls for counting results overall or by time bucket.
    • Standard output can now emit JSON-formatted aggregated counts (overall total or per time bucket).
  • Bug Fixes
    • Improved validation to reject aggregation options for output handlers that don’t support them, and tightened aggregation parsing consistency.
  • Documentation
    • Updated search help/visible options to reflect the new aggregation command format.

davemarco and others added 6 commits June 11, 2026 13:36
…e results cache.

Allows --count and --count-by-time to be used with the results-cache output handler, writing aggregation results directly to MongoDB via _id-keyed atomic $inc upserts so that partial results from multiple search processes merge correctly without a reducer.
…he search usage text outputs to the reducer.
…dout.

Adds `--count` and `--count-by-time` to the stdout output handler, emitting per-archive results as newline-delimited JSON. The options may be given without naming the `stdout` handler (e.g. `clp-s s <archives> <query> --count`), and the kv-ir search path now reports these aggregations as unsupported, consistent with the reducer and results-cache handlers.
@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

Search aggregation options are centralized in CommandLineArguments, handler parsing now rejects unsupported combinations or requires aggregation where needed, stdout count handlers emit JSON, and search routing plus IR search use the shared aggregation state.

Changes

Search aggregation plumbing

Layer / File(s) Summary
Aggregation contract and help text
components/core/src/clp_s/CommandLineArguments.hpp, components/core/src/clp_s/CommandLineArguments.cpp
CommandLineArguments adds search-level aggregation options, stores shared aggregation state, exposes getters, removes per-handler aggregation fields, and updates search help text.
Handler validation rules
components/core/src/clp_s/CommandLineArguments.cpp
Unsupported handlers reject aggregation, reducer parsing requires aggregation, stdout validates leftover options, and results-cache no longer copies aggregation state from handler options.
Stdout aggregation handlers
components/core/src/clp_s/OutputHandlerImpl.hpp, components/core/src/clp_s/OutputHandlerImpl.cpp
New stdout handlers count messages or time buckets and emit JSON summaries in finish().
Search handler routing
components/core/src/clp_s/clp-s.cpp
search_archive selects reducer, results-cache, and stdout handlers from the shared aggregation type.
KV IR aggregation gate
components/core/src/clp_s/kv_ir_search.cpp
search_kv_ir_stream checks the shared aggregation type before returning the aggregation-not-implemented error.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • y-scope/clp#2229: Also refactors how clp_s parses and propagates --count / --count-by-time through CommandLineArguments and output-handler selection.
  • y-scope/clp#2326: Also changes CommandLineArguments aggregation parsing and results-cache output-handler wiring for count-based search output.

Suggested reviewers

  • gibber9809
  • davidlion
  • LinZhihao-723
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 26.09% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarises the main user-facing change: adding count and count-by-time aggregation output for stdout.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@davemarco davemarco requested a review from davidlion June 19, 2026 02:51
davidlion and others added 4 commits June 24, 2026 19:22
…ers.

- Take string_view for the results-cache handler uri/collection/dataset/archive_id
  params and materialize std::string only at the mongocxx call sites.
- Drop the redundant ::clp_s:: qualifier on the count handlers' base class.
- Rename the count-by-time write timestamp param to timestamp_ms to match the
  bucket size unit.
…n handler.

Merges the results-cache count work and applies the same review fixes to
AggregationToStdoutOutputHandler: take string_view for archive_id, drop the
redundant ::clp_s:: qualifier on the base class, rename the count-by-time
bucket size to count_by_time_bucket_size_ms, and rename the count-by-time
write timestamp param to timestamp_ms.
@davemarco davemarco marked this pull request as ready for review June 25, 2026 19:22
@davemarco davemarco requested review from a team and gibber9809 as code owners June 25, 2026 19:22

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@components/core/src/clp_s/OutputHandlerImpl.cpp`:
- Around line 240-248: CountStdoutOutputHandler::finish is returning early when
m_count is zero, which skips emitting the archive JSON for --count. Update the
finish method so it always writes one JSON object per archive, including the
archive_id and a count of 0, and only uses the success return after emitting the
result; keep the existing JSON construction in CountStdoutOutputHandler::finish
and remove the zero-count early exit.

In `@components/core/src/clp_s/OutputHandlerImpl.hpp`:
- Around line 437-445: The bucket calculation in OutputHandlerImpl::write is
using truncating division, so negative timestamps are assigned to the wrong time
bucket. Update the bucket derivation to use floor-style bucketing for
epochtime_t values, and make the same change in the sibling count-by-time
handlers so their bucket logic stays consistent with
m_count_by_time_bucket_size_ms.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2441ee68-0def-417a-aa79-bae16df62ca7

📥 Commits

Reviewing files that changed from the base of the PR and between 0e7535a and 8381d82.

📒 Files selected for processing (6)
  • components/core/src/clp_s/CommandLineArguments.cpp
  • components/core/src/clp_s/CommandLineArguments.hpp
  • components/core/src/clp_s/OutputHandlerImpl.cpp
  • components/core/src/clp_s/OutputHandlerImpl.hpp
  • components/core/src/clp_s/clp-s.cpp
  • components/core/src/clp_s/kv_ir_search.cpp

Comment thread components/core/src/clp_s/OutputHandlerImpl.cpp
Comment thread components/core/src/clp_s/OutputHandlerImpl.hpp

@davidlion davidlion left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor comments. We're currently in discussion about the best way to go about de-duplicating the aggregator x output-handler logic.

Comment thread components/core/src/clp_s/CommandLineArguments.cpp Outdated
Comment thread components/core/src/clp_s/CommandLineArguments.hpp Outdated
Comment thread components/core/src/clp_s/OutputHandlerImpl.hpp Outdated
search_options.add(match_options);

po::options_description aggregation_options("Aggregation Controls");
// clang-format off

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does look slightly worse without it. like it wraps the fields and descriptions on the same line. in my opinion
it reads nicer without the auto wrapping.

…tion_for_handler and a <clp_s/...> include in OutputHandlerImpl.
@davemarco davemarco requested a review from davidlion June 26, 2026 15:46

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
components/core/src/clp_s/OutputHandlerImpl.hpp (1)

392-414: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Emit zero-count archives from finish().

The paired CountStdoutOutputHandler::finish() implementation currently returns early when m_count == 0, so searched archives with zero matches produce no JSON at all. That breaks the advertised --count contract of one object per archive and makes “0 matches” indistinguishable from “archive was never searched”.

Proposed fix in components/core/src/clp_s/OutputHandlerImpl.cpp
 auto CountStdoutOutputHandler::finish() -> ErrorCode {
-    if (0 == m_count) {
-        return ErrorCode::ErrorCodeSuccess;
-    }
-
     nlohmann::json result;
     result[constants::results_cache::search::cArchiveId] = m_archive_id;
     result[constants::results_cache::search::cCount] = m_count;
     std::cout << result.dump() << '\n';
     return ErrorCode::ErrorCodeSuccess;
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/core/src/clp_s/OutputHandlerImpl.hpp` around lines 392 - 414,
CountStdoutOutputHandler::finish() currently skips emitting output when m_count
is zero, so zero-match archives never produce a JSON record. Update the finish()
implementation for CountStdoutOutputHandler to always emit one count object per
archive_id, even when the count is 0, and then return success. Keep the behavior
consistent with the write(std::string_view) counter logic and use the existing
CountStdoutOutputHandler::finish() path to locate the fix.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@components/core/src/clp_s/OutputHandlerImpl.hpp`:
- Around line 392-414: CountStdoutOutputHandler::finish() currently skips
emitting output when m_count is zero, so zero-match archives never produce a
JSON record. Update the finish() implementation for CountStdoutOutputHandler to
always emit one count object per archive_id, even when the count is 0, and then
return success. Keep the behavior consistent with the write(std::string_view)
counter logic and use the existing CountStdoutOutputHandler::finish() path to
locate the fix.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1d262b50-96a3-41c5-9319-b759bb79f23c

📥 Commits

Reviewing files that changed from the base of the PR and between 8381d82 and 443a887.

📒 Files selected for processing (3)
  • components/core/src/clp_s/CommandLineArguments.cpp
  • components/core/src/clp_s/CommandLineArguments.hpp
  • components/core/src/clp_s/OutputHandlerImpl.hpp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants