@@ -14,12 +14,23 @@ test notices. A break no test catches — a **surviving
1414mutant** — is a coverage blind spot that line coverage
1515cannot see.
1616
17- The canonical worked example is the net_addr audit
18- (2026-06-08 ): ** 78.2 % → 80.5 % raw** , ** 92.4 % → 95.2 %
19- equivalent-adjusted** , 14 test-only commits. The runbook
20- and results live at
17+ The first worked example is the net_addr audit (2026-06-08,
18+ a flat value-type library ): ** 78.2 % → 80.5 % raw** ,
19+ ** 92.4 % → 95.2 % equivalent-adjusted** , 14 test-only
20+ commits. Runbook + results:
2121` docs/refactor/net_addr_mutation_audit.md ` and
22- ` docs/refactor/net_addr_mutation_audit_results.md ` .
22+ ` …_results.md ` .
23+
24+ The at-scale example is the net_proto audit (2026-06-09, 20
25+ protocols + shared ` lib ` , ** 27,007 mutants across 21
26+ sharded runs** ): ** 70 genuine gaps closed test-only** ,
27+ including ** three whole-codec omissions** (an option, two
28+ messages, with no test file at all). It is the precedent for
29+ the dependency-scoped sharding (rail #2 ), the whole-file-gap
30+ first-check (§6), and the result-preserving / base-coincidence
31+ equivalent classes (§4). Results:
32+ ` docs/refactor/net_proto_mutation_audit_results.md ` (the
33+ plan/sharding doc is ` …_mutation_audit.md ` ).
2334
2435## When to invoke
2536
@@ -67,14 +78,38 @@ result or a near-miss during the net_addr audit.
6778 dies fast as ` MemoryError ` (counted * killed* ) instead.
6879 Monitor ` free -h ` during the run regardless.
6980
70- 2 . ** Test-command = the FULL package unit suite, never a
71- narrower scope.** A narrow test-command reports
72- ** false-positive survivors** — mutants the broader suite
73- would kill. (The trial run scoped to one test file and
74- "found" a survivor that the sibling SACK tests already
75- killed.) Running the whole package suite per mutant means
76- every survivor is a * genuine* gap no test anywhere
77- catches.
81+ 2 . ** Test-command = the FULL consumer set of the mutated
82+ code, never UNDER that.** A test-command that omits any
83+ test which exercises the mutated module reports
84+ ** false-positive survivors** — mutants a left-out test
85+ would kill. (The net_addr trial run scoped to one test
86+ file and "found" a survivor the sibling SACK tests already
87+ killed.) The danger is * under* -scoping below the real
88+ consumer set — NOT scoping to the exact consumers.
89+ ** Refinement from the net_proto audit (4.5× scale):** for
90+ a package of * independent* modules (e.g. net_proto's 20
91+ protocols — a udp mutant can never be killed by a tcp
92+ test), the ** correct** test scope is that module's own
93+ tests ** plus the shared-dependency tests it transitively
94+ needs** (` tests/unit/lib ` ), and sharding per module that
95+ way is right — * not* the dangerous narrowing this rail
96+ warns about. It also runs ~ 3× faster and lets you
97+ prioritize / stop early. ** The one exception is shared
98+ foundation code** (net_proto's ` lib/ ` : ` inet_cksum ` ,
99+ ` int_checks ` , ` proto_* ` ) — that is consumed by every
100+ module, so its shard MUST run the ** full** package suite.
101+ Rule of thumb: scope to * exactly the set of tests that can
102+ kill a mutant in this module* , computed from the
103+ dependency graph — full-suite when in doubt, dependency-
104+ scoped when the independence is provable.
105+ ** Cross-module-constant blind spot:** a constant defined
106+ in module A but * consumed* by module B (net_proto's
107+ ` IP6__MIN_MTU ` , defined in ip6, used by icmp6 error
108+ messages) survives A's shard but is killed by B's tests.
109+ When a survivor is a bare constant with no in-module
110+ reader, check whether another module's suite kills it
111+ before calling it a gap — it is * cross-shard-covered* , not
112+ a real gap.
78113
791143 . ** Clear ` __pycache__ ` before the run AND between EVERY
80115 manual mutate→revert.** A stale ` .pyc ` makes a later run
@@ -258,6 +293,30 @@ mutated line.
258293 are killable — and those are killable ** directly** , by
259294 calling the helper with crafted inputs, NOT by coaxing
260295 the generator into producing them (see §6 lesson).
296+ 10 . ** Result-preserving optimizations** (net_proto) — an
297+ internal fast path whose output is identical regardless
298+ of how it chunks. ` inet_cksum ` 's 8-byte loop:
299+ ` (remainder := buffer_len - offset) >= 8 ` and `q_count =
300+ remainder >> 3` — mutating the chunk threshold ( ` >= 8` →
301+ ` >= 9 ` ) or the chunk count (` >> 3 ` → ` >> 4 ` ) only shifts
302+ bytes between the fast path and the remainder loop; the
303+ one's-complement sum is associative, so the checksum is
304+ byte-identical. Verify by confirming the mutant survives
305+ * every* consumer's round-trip test, then it is equivalent.
306+ 11 . ** Base-coincidence arithmetic** (net_proto) — a constant
307+ whose specific value makes an operator mutation coincide
308+ with the original on the entire * reachable* domain.
309+ Examples: ARP ` hrtype == 0x0001 ` has byte 0 = 0, so
310+ reading ` frame[1:2] ` classifies identically to
311+ ` frame[0:2] ` ; routing-header ` routing_type == RH0 ` where
312+ ` RH0 = 0 ` makes ` <= 0 ` ≡ ` == 0 ` on the byte domain; a
313+ pointer check where ` POINTER_BASE == SLOT_LEN == 4 ` makes
314+ ` (p - 4) % 4 ` ≡ ` (p + 4) % 4 ` ; IGMP max-resp-` code == 128 `
315+ where the linear value (128) equals the float decode
316+ ` (0|0x10) << 3 ` (128). Generalizes class 4 (max-value /
317+ non-negative). Killable only by an input the realistic
318+ wire never carries — usually low-value; confirm the
319+ coincidence arithmetically before deferring.
261320
262321The remainder are ** genuine gaps** . Triage each by reading
263322the mutated line; propose the test that would catch it.
@@ -287,18 +346,73 @@ input or trusting a single test file is how false claims ship.
2873465 . ** ` git checkout ` the source + clear ` __pycache__ ` again.**
2883476 . Verify ` git diff ` on the package is empty before moving on.
289348
290- For a batch of kill-proofs, wrap each in a function that
291- clears pycache between iterations — a tight ` cp/sed/run `
292- loop within the same filesystem-mtime second WILL reuse a
293- stale ` .pyc ` and lie to you.
349+ ** The self-referential-constant trap (net_proto, hit twice).**
350+ When the gap is a ` NumberReplacer ` on a * constant definition*
351+ (` TCP__MIN_MSS = 536 ` → ` 537 ` , ` IP6__DEFAULT_HOP_LIMIT = 64 `
352+ → ` 65 ` ), the killing test MUST assert against the ** literal**
353+ value (` assert x == 536 ` ), NOT against the imported constant
354+ (` assert x == TCP__MIN_MSS ` ). Asserting against the constant
355+ makes the expectation move * with* the mutation — both sides
356+ change to 537, the assert still passes, the mutant survives.
357+ This passed my first kill-proof for MIN_MSS and hop=64 and
358+ looked closed; only re-running the survivor scan exposed it.
359+ Assert the literal; optionally add a second
360+ ` assert THE_CONSTANT == 536 ` line to pin the constant by name
361+ too.
362+
363+ ** Use a Python harness for batches, not a shell loop**
364+ (net_proto). A ` cp/sed/run ` shell loop — especially with
365+ ` r=$(kp ...) ` command substitution or a trailing ` | sort ` —
366+ can have its ** restore step (` cp bak file ` ) race or get its
367+ stdout eaten** , leaving a ** stranded mutation on disk** (it
368+ happened: 10 dhcp4 option files left with ` <= ` applied, then
369+ the next iteration found nothing to substitute). A small
370+ Python driver (` subprocess.run ` per mutant, ` open(f,"w") `
371+ restore, ` shutil.rmtree ` pycache between) is deterministic
372+ and prints each result; it never strands. Always
373+ ` git diff --stat ` the package after a batch regardless
374+ (rail #6 ).
294375
295376---
296377
297378## 6. Common real-gap patterns (where survivors actually cluster)
298379
380+ - ** Whole-file / whole-thing omissions — CHECK THIS FIRST**
381+ (net_proto). The highest-value finds are not arithmetic at
382+ all: an entire source file (an option, a message, a codec)
383+ with ** no dedicated test** , where * every* mutant in it
384+ survives. Line coverage shows it "covered" because the
385+ dispatch * imports* it, but its logic is never asserted. The
386+ net_proto audit found three — TCP FastOpen option, ICMPv6
387+ Packet Too Big, MLDv2 Query (185 survivors). ** Before
388+ triaging individual operators, bucket survivors by source
389+ file and compare the count against a `find … -name
390+ 'test__ * <file >* '`** — a file with ~ 80–185 survivors and no
391+ test file is a whole-thing gap. Close it with the full
392+ per-file test (the §8 test-matrix in ` unit_testing.md ` ),
393+ not a one-off; it converts the most mutants per unit effort.
394+ - ** The dispatch-guaranteed assert, untested everywhere**
395+ (net_proto). The ` buffer[0] == int(Type) ` / `from_bytes(...)
396+ == int(Type)` kind-byte assert at the top of every option's
397+ ` from_buffer ` was untested across ~ 30 options in 4 protocols
398+ (ip4 / dhcp4 / dhcp6 / accecn). The container dispatch
399+ * guarantees* the byte, so the assert never fires in normal
400+ flow — but ` == Type ` → ` <= ` / ` >= ` survives with no
401+ wrong-type test. One cheap shared batch: a wrong-type-below
402+ (e.g. ` 0x00 ` ) and wrong-type-above (` 0xff ` / ` 0xffff ` )
403+ ` from_buffer ` over a valid frame, expecting ` AssertionError ` .
404+ Detect the gap quickly by scripting the ` <= ` mutation across
405+ every option and re-running just that option's suite.
299406- ** Degenerate / weak fixtures.** The single most common
300- real gap. A test asserts the right output but for an input
301- where many mutations coincide:
407+ * arithmetic* gap. A test asserts the right output but for an
408+ input where many mutations coincide:
409+ - ** empty-data / zero-value operands** — an ` X + len(data) `
410+ ` __len__ ` asserted only with ` data=b"" ` (so ` X+0 == X-0 ` ,
411+ the ` + ` → ` - ` survives); a timestamp slice asserted only
412+ with a top-byte-zero value (so ` [+4:+8] ` → ` [+5:+8] ` reads
413+ the same int); a header flag round-trip with only ` rd ` set
414+ (every other bit position unexercised). Use non-empty
415+ data, a top-byte-set value, ** all flags distinct** .
302416 - all-zero-byte operands (MAC ` 02:00:00:... ` hides EUI-64
303417 field-placement bugs) → use a non-degenerate operand
304418 (` aa:bb:cc:dd:ee:ff ` ).
@@ -321,6 +435,15 @@ stale `.pyc` and lie to you.
321435 landing * exactly* on ` 0 ` or ` MAX ` (both valid). Add the
322436 exact-endpoint cases to pin the ` <= ` and the ` - 1 ` max
323437 constant.
438+ - ** One-sided length boundaries** (net_proto). A fixed-length
439+ check ` buffer[1] != LEN ` whose only wrong-length test uses
440+ an * under* -length value — ` < ` and ` != ` agree below ` LEN ` , so
441+ the ` != ` → ` < ` mutant survives; it dies only on an
442+ * over* -length frame (` LEN+1 ` ). Likewise a version/type check
443+ tested only * below* the value (DNS ` ver=5 ` , ARP ` hrtype=0 ` ,
444+ ip6 ` ver=5 ` ): ` != ` → ` < ` survives until you add an * above*
445+ case (` ver=7 ` ). Always test wrong-value on ** both** sides of
446+ a ` != ` .
324447- ** One-sided predicate tests.** A prefix predicate
325448 (` & mask == prefix ` ) tested with a None-case on only one
326449 side of the prefix. The ` == ` →` <= ` mutant needs a * below* -
@@ -433,6 +556,10 @@ characterised.
433556- ` docs/refactor/net_addr_mutation_audit_results.md ` — the
434557 exemplar results document (per-module table, equivalent
435558 ledger, kill-proven corrections).
559+ - ` docs/refactor/net_proto_mutation_audit.md ` /
560+ ` …_results.md ` — the at-scale precedent (21 sharded runs,
561+ dependency-scoped test-commands, whole-file gaps, the
562+ per-shard score table + deep-TLV follow-up seam).
436563- ` .claude/rules/unit_testing.md ` — test authoring (the
437564 corrections land as unit tests; §7.2 docstring audit, §6a
438565 mocking, tight assertions).
0 commit comments