You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs(EXPLAINME): convert 16 Markdown code fences to AsciiDoc source blocks
Replaces 32 Markdown ``` markers with AsciiDoc [source,lang] / ---- form
across the file (languages observed: rust, csv, bash + several
unannotated). All blocks live inside numbered list items, so each gets a
+ continuation marker to preserve list structure. Part of the cross-estate
"EXPLAINME.adoc quote fixes" sweep (largest single file in the sweep).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This re-computes hashes and compares against manifest. Any mismatch (bit flip, corruption, tampering) is detected and reported.
41
48
42
49
4. **Formal Proof** (Isabelle/HOL): The theorem `VAEDataset_Splits.thy` (lines 120-140) proves that if all hashes match, the bijection property holds: every Original image has exactly one matching VAE image.
43
50
@@ -77,15 +84,18 @@ ____
77
84
The normalizer partitions images deterministically into 4 disjoint subsets:
78
85
79
86
1. **Random Split Algorithm** (default):
80
-
```rust
81
-
let mut rng = ChaCha8Rng::seed_from_u64(seed); // Fixed seed for reproducibility
82
-
let n = images.len();
83
-
let train_end = (n * 70) / 100; // 70% = indices 0..train_end
84
-
let test_end = train_end + (n * 15) / 100; // 15% = indices train_end..test_end
85
-
let val_end = test_end + (n * 10) / 100; // 10% = indices test_end..val_end
86
-
// Remaining: Calibration (5%)
87
-
```
88
-
Code: `src/main.rs` lines 100-150, function `split_random()`.
87
+
+
88
+
[source,rust]
89
+
----
90
+
let mut rng = ChaCha8Rng::seed_from_u64(seed); // Fixed seed for reproducibility
91
+
let n = images.len();
92
+
let train_end = (n * 70) / 100; // 70% = indices 0..train_end
93
+
let test_end = train_end + (n * 15) / 100; // 15% = indices train_end..test_end
94
+
let val_end = test_end + (n * 10) / 100; // 10% = indices test_end..val_end
95
+
// Remaining: Calibration (5%)
96
+
----
97
+
+
98
+
Code: `src/main.rs` lines 100-150, function `split_random()`.
89
99
90
100
2. **Stratified Split Option** (optional):
91
101
- Groups images by file size bucket (e.g., "small" = 0-10KB, "medium" = 10-50KB, etc.)
@@ -94,13 +104,14 @@ The normalizer partitions images deterministically into 4 disjoint subsets:
94
104
Code: `src/main.rs` lines 160-200, function `split_stratified()`.
95
105
96
106
3. **Output Files**: Four text files, one per split:
97
-
```
98
-
output/splits/
99
-
├── random_train.txt # 70% of filenames
100
-
├── random_test.txt # 15%
101
-
├── random_val.txt # 10%
102
-
└── random_calibration.txt # 5%
103
-
```
107
+
+
108
+
----
109
+
output/splits/
110
+
├── random_train.txt # 70% of filenames
111
+
├── random_test.txt # 15%
112
+
├── random_val.txt # 10%
113
+
└── random_calibration.txt # 5%
114
+
----
104
115
105
116
4. **Formal Verification** (Isabelle/HOL):
106
117
The theorem `VAEDataset_Splits.thy` (lines 1-50) proves three properties:
@@ -109,9 +120,11 @@ The normalizer partitions images deterministically into 4 disjoint subsets:
0 commit comments