Skip to content

Duplicate metric collection error causes complete scrape failure for consumer groups #335

@lu-you

Description

@lu-you

Environment

  • kminion version: 2.3.0 (also reproduced on 2.2.17)
  • MINION_CONSUMERGROUPS_SCRAPEMODE: adminApi (also reproduced with offsetsTopic)
  • MINION_CONSUMERGROUPS_GRANULARITY: partition
  • Kafka brokers: 3 nodes
    Problem
    When a Kafka cluster has consumer groups with a large number of members (38+)
    subscribed to multiple topics (5+ topics), kminion fails to serve any metrics
    with the following error on every scrape:
    An error has occurred while serving metrics:
    19 error(s) occurred:
  • collected metric "kminion_kafka_consumer_group_info" { label:{name:"coordinator_id"
    value:"1"} label:{name:"group_id" value:"my-group"} ... } was collected before
    with the same name and label values
  • collected metric "kminion_kafka_consumer_group_members" { label:{name:"group_id"
    value:"my-group"} ... } was collected before with the same name and label values
    ...
    Impact
    The Prometheus registry rejects the entire scrape when duplicates are detected,
    meaning zero metrics are exported from kminion during affected scrape cycles.
    Steps to Reproduce
  1. Have a Kafka consumer group with 38+ active members
  2. The group subscribes to 5+ topics simultaneously
  3. Deploy kminion with either adminApi or offsetsTopic scrape mode
  4. Hit /metrics endpoint — error is consistent, not intermittent
    Root Cause (hypothesis)
    In concurrent metric collection, the same consumer group is being processed by
    multiple goroutines simultaneously:
  • In offsetsTopic mode: the group's offset data is spread across multiple
    __consumer_offsets partitions, each processed by a separate goroutine.
    Group-level metrics (group_info, group_members) are emitted once per partition
    that contains data for that group, instead of once per group.
  • In adminApi mode: similar concurrent DescribeGroups processing causes the same
    group to emit group-level metrics more than once.
    Expected Behavior
    Each consumer group emits group-level metrics exactly once per scrape cycle,
    regardless of how many topics it subscribes to or how many goroutines process its data.
    Suggested Fix
    Add a deduplication guard (e.g., sync.Map) in the consumer group metric emission
    logic to ensure group-level metrics (group_info, group_members, empty_members)
    are only written to the metrics channel once per group per collection cycle.
    Workaround
    Exclude affected groups via:
    MINION_CONSUMERGROUPS_IGNOREDGROUPS=my-group-1|my-group-2
    This causes complete loss of visibility for those groups.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions