feat(marking): borderline detection — flag rating vs derived-band disagreement by hyperpolymath · Pull Request #55 · hyperpolymath/tma-mark2

hyperpolymath · 2026-06-10T09:11:36Z

Stacked PR (4 of 4 follow-ups). Base = `feature/composer-grid-component-mapping` (PR #54). Merge order: #51 -> #52 -> #53 -> #54 -> this.

Summary

When the tutor's ordinal rating for a component disagrees significantly with the band derived from the per-component numeric grid, the composer flags it so the tutor can sanity-check before generating feedback.

Severity heuristic

Ratings ranked 0-5: `missing=0, serious_issue=1, weak=2, adequate=3, sound=4, strong=5`
Bands ranked 0-5: `fail=0, bare_fail=1, adequate=2, fair=3, good=4, excellent=5`
`|rating_rank - band_rank| >= 2` → flagged
- `>= 3` step gap → `:severe`
- `== 2` step gap → `:mild`
Sign of gap → `:over_rated` (rating > derived) or `:under_rated`

A 1-step gap (e.g. "strong" rating + good-band numeric) is not flagged — that's normal rating-vs-band variance.

What ships in this PR

`borderline_check/2` public API on the composer — input rating + aggregate, output flag map or `nil`
`compose/1` result gains `:borderlines` (`%{component_id => flag}`)
LiveView
- Amber banner above the per-component cards when any flags are present
- Flagged per-component cards get amber styling + one-line explainer (`"rating: Strong, numeric: bare fail (severe over rated)"`)

What this does NOT do

It does not muddy the `did_well` or `improve` prose itself. The flag is a meta-observation surfaced in the UI; the composed feedback stays clean. (Easy follow-up if you want it inline in prose.)
Thresholds (2 step / 3 step) are hardcoded. Configurable thresholds per rubric would be a small follow-up.

Tests

45 total, 11 new
severe over_rated (strong + fail-band), severe under_rated (weak + excellent-band; missing + good-band)
mild over_rated (2-step gap)
no flag for 1-step gap (strong + good, sound + fair)
no flag for ranks within tolerance
`nil` for missing aggregate or unrecognised rating
`compose/1`: per-component flagged when disagreement; empty map when ratings agree or no numeric data

Test plan

45/45 pass via standalone elixirc + ExUnit
LiveView syntax-checks
In browser: enter Q1=5/25 with rating "strong" on reflective → component card turns amber, banner shows "1 component shows rating ↔ numeric disagreement"
Adjusting either the rating to "weak" or the mark upwards clears the flag

🤖 Generated with Claude Code

…agreement When the tutor's ordinal rating disagrees significantly with the band derived from the per-component numeric grid, the composer flags it so the tutor can review before generating feedback. Severity heuristic - Ratings ranked 0-5 (missing..strong); bands ranked 0-5 (fail..excellent) - |rating_rank - band_rank| >= 2 -> flagged - >= 3 step gap -> :severe; >= 2 -> :mild - Sign of gap -> :over_rated (rating > derived) or :under_rated Composer module - borderline_check/2 public API Input: rating string, per-component aggregate (or nil) Output: flag map or nil - collect_borderlines/3 (private) builds the per-component report inside compose/1 - compose/1 result map gains :borderlines (%{component_id => flag}) LiveView - Amber banner above the per-component cards: "N component(s) show rating ↔ numeric disagreement — review before generating." - Per-component cards with a flag get amber styling + a one-line explainer: "rating: X, numeric: Y (severe over rated)" etc. Tests (45 total, 11 new) - severe over_rated (strong + fail-band) - severe under_rated (weak + excellent-band, missing + good-band) - mild over_rated (2-step gap) - no flag for 1-step gap (strong + good, sound + fair) - no flag for ranks within tolerance - returns nil for missing aggregate / unrecognised rating - compose/1: per-component flagged when disagreement; empty map when ratings agree or no numeric data Verified locally via standalone elixirc + ExUnit (45/0). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(marking): borderline detection — flag rating vs derived-band disagreement#55

feat(marking): borderline detection — flag rating vs derived-band disagreement#55
hyperpolymath wants to merge 1 commit into
feature/composer-grid-component-mappingfrom
feature/composer-borderline-detection

hyperpolymath commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

hyperpolymath commented Jun 10, 2026

Summary

Severity heuristic

What ships in this PR

What this does NOT do

Tests

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant