Skip to content

Commit 4035938

Browse files
hyperpolymathclaude
andcommitted
docs(EXPLAINME): convert 16 Markdown code fences to AsciiDoc source blocks
Replaces 32 Markdown ``` markers with AsciiDoc [source,lang] / ---- form across the file (languages observed: rust, csv, bash + several unannotated). All blocks live inside numbered list items, so each gets a + continuation marker to preserve list structure. Part of the cross-estate "EXPLAINME.adoc quote fixes" sweep (largest single file in the sweep). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent ac49d9b commit 4035938

1 file changed

Lines changed: 119 additions & 88 deletions

File tree

EXPLAINME.adoc

Lines changed: 119 additions & 88 deletions
Original file line numberDiff line numberDiff line change
@@ -19,25 +19,32 @@ ____
1919
The normalizer computes a SHAKE256 digest for every image file in the dataset:
2020

2121
1. **Digest Computation**: For each image file, compute SHAKE256 hash:
22-
```
23-
hash = SHAKE256(file_bytes, length=256 bits)
24-
hex_string = hex_encode(hash)
25-
```
26-
Code: `src/main.rs` function `shake256_d256()` (lines 48-54) uses the `tiny-keccak` crate (FIPS 202 compliant).
22+
+
23+
----
24+
hash = SHAKE256(file_bytes, length=256 bits)
25+
hex_string = hex_encode(hash)
26+
----
27+
+
28+
Code: `src/main.rs` function `shake256_d256()` (lines 48-54) uses the `tiny-keccak` crate (FIPS 202 compliant).
2729

2830
2. **Manifest Creation**: All hashes written to a manifest file (`output/manifest.csv`):
29-
```csv
30-
filename,sha256,size_bytes,category
31-
image001.png,a1b2c3d4e5...,15234,Original
32-
image001.png,f6g7h8i9j0...,14876,VAE
33-
image002.png,k1l2m3n4o5...,18912,Original
34-
```
31+
+
32+
[source,csv]
33+
----
34+
filename,sha256,size_bytes,category
35+
image001.png,a1b2c3d4e5...,15234,Original
36+
image001.png,f6g7h8i9j0...,14876,VAE
37+
image002.png,k1l2m3n4o5...,18912,Original
38+
----
3539

3640
3. **Verification**: Users can verify all files post-transfer:
37-
```bash
38-
vae-normalizer verify --checksums -d /path/to/output
39-
```
40-
This re-computes hashes and compares against manifest. Any mismatch (bit flip, corruption, tampering) is detected and reported.
41+
+
42+
[source,bash]
43+
----
44+
vae-normalizer verify --checksums -d /path/to/output
45+
----
46+
+
47+
This re-computes hashes and compares against manifest. Any mismatch (bit flip, corruption, tampering) is detected and reported.
4148

4249
4. **Formal Proof** (Isabelle/HOL): The theorem `VAEDataset_Splits.thy` (lines 120-140) proves that if all hashes match, the bijection property holds: every Original image has exactly one matching VAE image.
4350

@@ -77,15 +84,18 @@ ____
7784
The normalizer partitions images deterministically into 4 disjoint subsets:
7885

7986
1. **Random Split Algorithm** (default):
80-
```rust
81-
let mut rng = ChaCha8Rng::seed_from_u64(seed); // Fixed seed for reproducibility
82-
let n = images.len();
83-
let train_end = (n * 70) / 100; // 70% = indices 0..train_end
84-
let test_end = train_end + (n * 15) / 100; // 15% = indices train_end..test_end
85-
let val_end = test_end + (n * 10) / 100; // 10% = indices test_end..val_end
86-
// Remaining: Calibration (5%)
87-
```
88-
Code: `src/main.rs` lines 100-150, function `split_random()`.
87+
+
88+
[source,rust]
89+
----
90+
let mut rng = ChaCha8Rng::seed_from_u64(seed); // Fixed seed for reproducibility
91+
let n = images.len();
92+
let train_end = (n * 70) / 100; // 70% = indices 0..train_end
93+
let test_end = train_end + (n * 15) / 100; // 15% = indices train_end..test_end
94+
let val_end = test_end + (n * 10) / 100; // 10% = indices test_end..val_end
95+
// Remaining: Calibration (5%)
96+
----
97+
+
98+
Code: `src/main.rs` lines 100-150, function `split_random()`.
8999

90100
2. **Stratified Split Option** (optional):
91101
- Groups images by file size bucket (e.g., "small" = 0-10KB, "medium" = 10-50KB, etc.)
@@ -94,13 +104,14 @@ The normalizer partitions images deterministically into 4 disjoint subsets:
94104
Code: `src/main.rs` lines 160-200, function `split_stratified()`.
95105

96106
3. **Output Files**: Four text files, one per split:
97-
```
98-
output/splits/
99-
├── random_train.txt # 70% of filenames
100-
├── random_test.txt # 15%
101-
├── random_val.txt # 10%
102-
└── random_calibration.txt # 5%
103-
```
107+
+
108+
----
109+
output/splits/
110+
├── random_train.txt # 70% of filenames
111+
├── random_test.txt # 15%
112+
├── random_val.txt # 10%
113+
└── random_calibration.txt # 5%
114+
----
104115

105116
4. **Formal Verification** (Isabelle/HOL):
106117
The theorem `VAEDataset_Splits.thy` (lines 1-50) proves three properties:
@@ -109,9 +120,11 @@ The normalizer partitions images deterministically into 4 disjoint subsets:
109120
- **Ratio Correctness**: |Train| / |Dataset| ≈ 0.70 (within 1% tolerance)
110121

111122
To verify:
112-
```bash
113-
isabelle build -d . -b VAEDataset_Splits
114-
```
123+
+
124+
[source,bash]
125+
----
126+
isabelle build -d . -b VAEDataset_Splits
127+
----
115128

116129
**Code Evidence:**
117130
- Random split: `src/main.rs` lines 100-150
@@ -217,82 +230,100 @@ If the RNG implementation has a bug, or if the system experiences a cosmic ray b
217230
=== Test Checksum Computation
218231

219232
1. Normalize a small test dataset:
220-
```bash
221-
vae-normalizer normalize -d examples/test-dataset -o output
222-
```
233+
+
234+
[source,bash]
235+
----
236+
vae-normalizer normalize -d examples/test-dataset -o output
237+
----
223238

224239
2. Inspect manifest:
225-
```bash
226-
cat output/manifest.csv
227-
# Observe SHAKE256 hashes (64 hex characters, 256 bits)
228-
```
240+
+
241+
[source,bash]
242+
----
243+
cat output/manifest.csv
244+
# Observe SHAKE256 hashes (64 hex characters, 256 bits)
245+
----
229246

230247
3. Corrupt a file and verify detection:
231-
```bash
232-
# Flip a bit in one image
233-
xxd -r -p - output/Original/image001.png <<< "FF" | head -c1 | dd of=output/Original/image001.png bs=1 count=1 conv=notrunc
248+
+
249+
[source,bash]
250+
----
251+
# Flip a bit in one image
252+
xxd -r -p - output/Original/image001.png <<< "FF" | head -c1 | dd of=output/Original/image001.png bs=1 count=1 conv=notrunc
234253
235-
# Verify
236-
vae-normalizer verify -o output --checksums
237-
# Error: image001.png hash mismatch — detected corruption
238-
```
254+
# Verify
255+
vae-normalizer verify -o output --checksums
256+
# Error: image001.png hash mismatch — detected corruption
257+
----
239258

240259
=== Test Split Disjointness
241260

242261
1. Run split:
243-
```bash
244-
vae-normalizer normalize -d examples/test-dataset -o output
245-
```
262+
+
263+
[source,bash]
264+
----
265+
vae-normalizer normalize -d examples/test-dataset -o output
266+
----
246267

247268
2. Check for overlaps:
248-
```bash
249-
# Count unique filenames across splits
250-
cat output/splits/*.txt | sort | uniq | wc -l
251-
# Should equal total file count
252-
253-
# Check no duplicates within splits
254-
cat output/splits/random_train.txt | sort | uniq -d
255-
# Should be empty (no duplicates)
256-
```
269+
+
270+
[source,bash]
271+
----
272+
# Count unique filenames across splits
273+
cat output/splits/*.txt | sort | uniq | wc -l
274+
# Should equal total file count
275+
276+
# Check no duplicates within splits
277+
cat output/splits/random_train.txt | sort | uniq -d
278+
# Should be empty (no duplicates)
279+
----
257280

258281
3. Verify ratios:
259-
```bash
260-
# Manual calculation
261-
train=$(wc -l < output/splits/random_train.txt)
262-
test=$(wc -l < output/splits/random_test.txt)
263-
val=$(wc -l < output/splits/random_val.txt)
264-
calib=$(wc -l < output/splits/random_calibration.txt)
265-
total=$((train + test + val + calib))
266-
267-
echo "Train: $((100 * train / total))% (target 70%)"
268-
echo "Test: $((100 * test / total))% (target 15%)"
269-
# Should be ±1% of targets
270-
```
282+
+
283+
[source,bash]
284+
----
285+
# Manual calculation
286+
train=$(wc -l < output/splits/random_train.txt)
287+
test=$(wc -l < output/splits/random_test.txt)
288+
val=$(wc -l < output/splits/random_val.txt)
289+
calib=$(wc -l < output/splits/random_calibration.txt)
290+
total=$((train + test + val + calib))
291+
292+
echo "Train: $((100 * train / total))% (target 70%)"
293+
echo "Test: $((100 * test / total))% (target 15%)"
294+
# Should be ±1% of targets
295+
----
271296

272297
=== Run Formal Proofs
273298

274299
1. Install Isabelle:
275-
```bash
276-
# On Fedora/RHEL
277-
dnf install isabelle
300+
+
301+
[source,bash]
302+
----
303+
# On Fedora/RHEL
304+
dnf install isabelle
278305
279-
# Or build from source
280-
git clone https://github.com/isabelle-prover/isabelle
281-
cd isabelle && ./build
282-
```
306+
# Or build from source
307+
git clone https://github.com/isabelle-prover/isabelle
308+
cd isabelle && ./build
309+
----
283310

284311
2. Verify theorems:
285-
```bash
286-
cd /var/mnt/eclipse/repos/zerostep
287-
isabelle build -d . -b VAEDataset_Splits
288-
# Output: Build session VAEDataset_Splits — 100% complete
289-
```
312+
+
313+
[source,bash]
314+
----
315+
cd /var/mnt/eclipse/repos/zerostep
316+
isabelle build -d . -b VAEDataset_Splits
317+
# Output: Build session VAEDataset_Splits — 100% complete
318+
----
290319

291320
3. Inspect proof:
292-
```bash
293-
cat theories/VAEDataset_Splits.thy | grep "theorem\|lemma" | head
294-
# Lists all proven propositions
295-
```
321+
+
322+
[source,bash]
323+
----
324+
cat theories/VAEDataset_Splits.thy | grep "theorem\|lemma" | head
325+
# Lists all proven propositions
326+
----
296327

297328
---
298329

0 commit comments

Comments
 (0)