@@ -54,6 +54,14 @@ shapes that dominated the residual.
5454- Re-confirming a package after a round of test additions
5555 (use the fast survivor-only re-scan, §8).
5656
57+ ** First, classify the shard's archetype (§6.5).** Thin-unit
58+ shards (fast unit baseline) are high-ROI — run the full scan
59+ and close gaps. Integration-saturated shards (session/fsm,
60+ packet handlers) are audit-only — sample against the * real*
61+ integration baseline, classify, close one demonstrator. Do
62+ not grind an integration-saturated shard against a slow or a
63+ seam-only baseline.
64+
5765## When NOT to invoke
5866
5967- As a ` make lint ` / CI gate. Mutation testing is a
@@ -247,7 +255,11 @@ score badly understates the suite** because a large fraction
247255of survivors are ** equivalent mutants** that * no runtime test
248256can ever kill* . Report raw AND equivalent-adjusted; the
249257adjusted number is the honest one. (net_addr: 80.5 % raw but
250- ** 95.2 %** once equivalents are excluded.)
258+ ** 95.2 %** once equivalents are excluded. pytcp Tier-2: ipc
259+ 64→68 % raw / ** ~ 98 %** adjusted; socket drop-in 39 % raw /
260+ ** ~ 85 %** adjusted — the annotation-mutant fraction climbs
261+ even higher on the typed runtime, e.g. socket drop-in had
262+ ~ 431 of 579 survivors in PEP 604 unions.)
251263
252264Classify survivors with the rule below. The first two classes
253265are mechanically detectable; the rest require reading the
@@ -328,6 +340,63 @@ mutated line.
328340 non-negative). Killable only by an input the realistic
329341 wire never carries — usually low-value; confirm the
330342 coincidence arithmetically before deferring.
343+ 12 . ** Bounded-domain enum/mode comparisons** (pytcp Tier-2) —
344+ a ` == ` against one value of a small * closed* set, where
345+ the other members make ` == ` ≡` >= ` or ` == ` ≡` <= ` over the
346+ whole reachable domain. The canonical case is a sysctl
347+ mode read: ` arp_ignore == 8 ` → ` >= 8 ` is equivalent
348+ because the mode is one of ` {0,1,2,8} ` and 8 is the max;
349+ ` arp_ignore == 2 ` → ` >= 2 ` is equivalent inside the
350+ ` else ` arm where the domain is already narrowed to
351+ ` {0,1,2} ` . Generalizes class 4 to enum-valued domains.
352+ Killable only by a mode value the validator forbids —
353+ low-value; confirm the domain before deferring.
354+ 13 . ** ` __debug__ and log(...) ` guard operands** (pytcp) — the
355+ pervasive ` __debug__ and log("chan", f"...") ` tracing
356+ idiom. Tests mock ` log ` , so mutating the ` and ` → ` or ` ,
357+ the f-string interpolation (` << ` → ` * ` inside the
358+ message), or the guard is invisible: the log call's
359+ * effect* is never asserted. Any survivor whose source line
360+ is inside a ` __debug__ and log( ` argument is equivalent.
361+ Dominant alongside class 1 on the handler shards.
362+ 14 . ** Idempotent re-assignment guards** (pytcp) — a `if x !=
363+ computed_value: x = computed_value` shape where the guard
364+ only gates a (mocked) log + a re-derivation that lands on
365+ the * same* value. Mutating the guard comparison (` != ` →
366+ ` == ` , ` << ` → ` ^ ` in the RHS) changes only whether the
367+ branch is * entered* ; the branch body re-assigns the
368+ correctly-computed value either way, so end state is
369+ identical. The value-bearing assignment line (not the
370+ guard) is the killable one — and it is usually already
371+ killed. (ARP/UDP RX window-update; the `snd_wnd != win <<
372+ wsc` guard.)
373+ 15 . ** Defensively-unreachable ` case _: ` / parser-rejected
374+ branches** (pytcp) — a handler ` match ` default that bumps
375+ an ` op_unknown__drop ` counter, where the * parser* (e.g.
376+ TX-strict ` from_int(...).is_unknown ` ) already rejects the
377+ bad value upstream, so the default is dead via any
378+ realistic wire input. The ` += 1 ` NumberReplacer survives
379+ because no frame reaches it. Distinguish from a real gap
380+ by tracing whether the wire value can reach the line at
381+ all (grep the parser's reject path first).
382+ 16 . ** Keyword-only ` * ` -separator AST no-ops** (pytcp) —
383+ cosmic-ray reports a ` ReplaceBinaryOperator_Mul_Div ` on a
384+ ` def f(self, *, arg) ` line (the ` * ` is the kw-only marker,
385+ not a binary op). The mutation either no-ops or turns
386+ ` * ` into ` / ` (positional-only), and every call site passes
387+ the arg * by keyword* anyway, so behaviour is unchanged.
388+ Killable in principle by `inspect.signature(...).kind is
389+ KEYWORD_ONLY` (see the tcp/state ` KeywordOnlySignatures`
390+ batch), but for runtime handlers the kw-only contract is
391+ low-value ceremony — defer unless the signature is a
392+ public API surface.
393+ 17 . ** Never-wrapping modular masks** (pytcp) — `(a - b) &
394+ 0xFFFF_FFFF` on a sequence-delta / flight-size / RTT that
395+ never actually wraps in the tested range, so dropping or
396+ altering the mask (` & 0xFFFF_FFFF ` → ` // 0xFFFF_FFFF ` ,
397+ ` << ` ) yields the same value. Killable only by a 32-bit-
398+ wrap scenario the suite does not drive; usually equivalent
399+ in practice. Class-4 cousin for explicit wrap masks.
331400
332401The remainder are ** genuine gaps** . Triage each by reading
333402the mutated line; propose the test that would catch it.
@@ -527,6 +596,90 @@ and prints each result; it never strands. Always
527596
528597---
529598
599+ ## 6.5 Shard archetypes & baseline economics (pytcp Tier-2)
600+
601+ The pytcp audit split into two archetypes with ** opposite
602+ ROI** , and recognising which one you are in * before* spending
603+ compute is the single biggest Tier-2 lesson. The
604+ discriminator is the ** test surface** , not the source.
605+
606+ ### Archetype A — thin-unit shards (high ROI, close gaps)
607+
608+ ` lib ` , ` tcp-math ` , ` tcp/state ` , ` ipc ` value codecs, the
609+ ` socket ` drop-in. Logic reachable by a ** fast unit test**
610+ (<1 s baseline) over a constructed object. Survivors are
611+ cheap to find and cheap to close; this is where mutation
612+ testing pays for itself (ipc closed 69 gaps, socket drop-in
613+ 69). Run the full-scan workflow (§3) and close real gaps.
614+
615+ ** Mock-construct the wrapper to get a fast baseline.** A
616+ daemon-/IO-backed wrapper whose logic runs * before* the IO
617+ (the ` socket ` drop-in: type-guards, ` makefile ` mode parsing,
618+ factory dispatch, blocking-mode state) is unit-testable by
619+ constructing it over a `create_autospec(Collaborator,
620+ spec_set=True)` and patching the lazy accessor
621+ (` patch.object(mod, "_get_default_stack", ...) ` ). This turned
622+ a 6.7 s daemon-integration baseline into a 0.4 s unit
623+ baseline and let 69 gaps close fast. Always check whether the
624+ "needs a daemon/socket" surface actually needs one for the
625+ * branch under test* .
626+
627+ ### Archetype B — integration-saturated shards (audit, don't grind)
628+
629+ ` tcp/session ` , ` tcp/fsm ` , ` runtime/packet_handler ` . Logic
630+ driven only by the integration suite (FSM state, wire RX→TX,
631+ stat counters). Two sub-cases:
632+
633+ - ** Baseline-bound (session/fsm)** — the * only* test surface
634+ that exercises the logic is the slow whole-protocol suite
635+ (TCP: 590 tests / 33 s). A per-collaborator * seam* test
636+ (3 tests / 792 LOC) is ** NOT a valid mutation baseline** —
637+ it under-reports ~ 15× (ack: ** 4.7 % seam vs 33–73 %
638+ full-suite** on the same survivors). Exhaustive closure is
639+ ~ 46 h of serial compute and mostly re-confirms saturation.
640+ ** Audit, don't grind:** sample survivors against the * full*
641+ suite to get the true rate, classify the residue, close
642+ ** one kill-proven demonstrator** (recover-marker decay),
643+ and document the recommendation (opportunistic backlog).
644+ - ** Per-protocol-auditable (packet_handler)** — each protocol
645+ * does* have a green-in-isolation, fast suite (arp/udp
646+ ~ 3.6 s), so a handler file mutation-tests cleanly against
647+ its own protocol's suite. Worked examples (arp, udp) both
648+ landed at ** ~ 76 % adjusted** : the suites assert exact
649+ ` packet_stats ` counters on every branch, so branch/counter
650+ mutations die. Residue is annotation (class 1) + log guards
651+ (class 13) + bounded-mode (class 12) + low-density genuine
652+ gaps on ** under-tested branches the suite's topology never
653+ reaches** (a sysctl mode on a single-subnet topology, a
654+ multi-socket fan-out, an error-emit selection). Close those
655+ * opportunistically* with a scenario test; a full 20-handler
656+ sweep is auditable but low-yield.
657+
658+ ### The baseline-choice rule
659+
660+ ** Pick the smallest test surface that genuinely exercises the
661+ mutated logic.** A seam/parity test that only pins the
662+ * refactor boundary* (e.g. "the collaborator is wired in")
663+ exercises ~ 5 % of the code and yields a meaningless 5 % score.
664+ Before trusting a per-mutant number, sanity-check the baseline:
665+ grep the candidate test for assertions on the * behaviour* the
666+ mutated lines produce. If it only asserts wiring/identity,
667+ escalate to the protocol's full integration suite — and if
668+ * that* is the only adequate surface and it is slow, switch from
669+ "close every gap" to "sample + classify + one demonstrator".
670+
671+ ### Reporting integration-saturated shards
672+
673+ Put them in the per-shard table with the real (sampled)
674+ adjusted rate, a ` † ` /` ‡ ` footnote stating the baseline caveat
675+ (seam-under-reports / sampled-not-exhaustive), and a status of
676+ ` AUDITED ` rather than ` DONE ` . The honest deliverable is the
677+ * finding* (this code is integration-saturated; here is the
678+ true rate; here is one demonstrator; the rest is opportunistic)
679+ — not a forced exhaustive grind.
680+
681+ ---
682+
530683## 7. Deliverable — the results document
531684
532685Write ` docs/refactor/<pkg>_mutation_audit_results.md ` :
@@ -622,6 +775,14 @@ characterised.
622775 ` …_results.md ` — the at-scale precedent (21 sharded runs,
623776 dependency-scoped test-commands, whole-file gaps, the
624777 per-shard score table + deep-TLV follow-up seam).
778+ - ` docs/refactor/pytcp_mutation_audit_results.md ` — the
779+ Tier-1/Tier-2 precedent for the §6.5 archetype split:
780+ thin-unit shards (lib / tcp-math / tcp-state / ipc / socket
781+ drop-in, gaps closed) vs integration-saturated shards
782+ (session/fsm baseline-bound + audited; packet_handler
783+ per-protocol-auditable at ~ 76 % adjusted). Worked examples
784+ of the mock-construct fast baseline and the seam-under-
785+ reports caveat.
625786- ` .claude/rules/unit_testing.md ` — test authoring (the
626787 corrections land as unit tests; §7.2 docstring audit, §6a
627788 mocking, tight assertions).
0 commit comments