Skip to content

feat(marking): borderline detection — flag rating vs derived-band disagreement#55

Open
hyperpolymath wants to merge 1 commit into
feature/composer-grid-component-mappingfrom
feature/composer-borderline-detection
Open

feat(marking): borderline detection — flag rating vs derived-band disagreement#55
hyperpolymath wants to merge 1 commit into
feature/composer-grid-component-mappingfrom
feature/composer-borderline-detection

Conversation

@hyperpolymath

Copy link
Copy Markdown
Owner

Stacked PR (4 of 4 follow-ups). Base = `feature/composer-grid-component-mapping` (PR #54). Merge order: #51 -> #52 -> #53 -> #54 -> this.

Summary

When the tutor's ordinal rating for a component disagrees significantly with the band derived from the per-component numeric grid, the composer flags it so the tutor can sanity-check before generating feedback.

Severity heuristic

  • Ratings ranked 0-5: `missing=0, serious_issue=1, weak=2, adequate=3, sound=4, strong=5`
  • Bands ranked 0-5: `fail=0, bare_fail=1, adequate=2, fair=3, good=4, excellent=5`
  • `|rating_rank - band_rank| >= 2` → flagged
    • `>= 3` step gap → `:severe`
    • `== 2` step gap → `:mild`
  • Sign of gap → `:over_rated` (rating > derived) or `:under_rated`

A 1-step gap (e.g. "strong" rating + good-band numeric) is not flagged — that's normal rating-vs-band variance.

What ships in this PR

  • `borderline_check/2` public API on the composer — input rating + aggregate, output flag map or `nil`
  • `compose/1` result gains `:borderlines` (`%{component_id => flag}`)
  • LiveView
    • Amber banner above the per-component cards when any flags are present
    • Flagged per-component cards get amber styling + one-line explainer (`"rating: Strong, numeric: bare fail (severe over rated)"`)

What this does NOT do

  • It does not muddy the `did_well` or `improve` prose itself. The flag is a meta-observation surfaced in the UI; the composed feedback stays clean. (Easy follow-up if you want it inline in prose.)
  • Thresholds (2 step / 3 step) are hardcoded. Configurable thresholds per rubric would be a small follow-up.

Tests

  • 45 total, 11 new
  • severe over_rated (strong + fail-band), severe under_rated (weak + excellent-band; missing + good-band)
  • mild over_rated (2-step gap)
  • no flag for 1-step gap (strong + good, sound + fair)
  • no flag for ranks within tolerance
  • `nil` for missing aggregate or unrecognised rating
  • `compose/1`: per-component flagged when disagreement; empty map when ratings agree or no numeric data

Test plan

  • 45/45 pass via standalone elixirc + ExUnit
  • LiveView syntax-checks
  • In browser: enter Q1=5/25 with rating "strong" on reflective → component card turns amber, banner shows "1 component shows rating ↔ numeric disagreement"
  • Adjusting either the rating to "weak" or the mark upwards clears the flag

🤖 Generated with Claude Code

…agreement

When the tutor's ordinal rating disagrees significantly with the band
derived from the per-component numeric grid, the composer flags it so
the tutor can review before generating feedback.

Severity heuristic
  - Ratings ranked 0-5 (missing..strong); bands ranked 0-5 (fail..excellent)
  - |rating_rank - band_rank| >= 2 -> flagged
  - >= 3 step gap -> :severe; >= 2 -> :mild
  - Sign of gap -> :over_rated (rating > derived) or :under_rated

Composer module
  - borderline_check/2 public API
    Input: rating string, per-component aggregate (or nil)
    Output: flag map or nil
  - collect_borderlines/3 (private) builds the per-component report inside
    compose/1
  - compose/1 result map gains :borderlines (%{component_id => flag})

LiveView
  - Amber banner above the per-component cards: "N component(s) show
    rating ↔ numeric disagreement — review before generating."
  - Per-component cards with a flag get amber styling + a one-line
    explainer: "rating: X, numeric: Y (severe over rated)" etc.

Tests (45 total, 11 new)
  - severe over_rated (strong + fail-band)
  - severe under_rated (weak + excellent-band, missing + good-band)
  - mild over_rated (2-step gap)
  - no flag for 1-step gap (strong + good, sound + fair)
  - no flag for ranks within tolerance
  - returns nil for missing aggregate / unrecognised rating
  - compose/1: per-component flagged when disagreement; empty map when
    ratings agree or no numeric data

Verified locally via standalone elixirc + ExUnit (45/0).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant