Commit 584f581
authored
ml-kem, module-lattice: avoid UDIV in compiled output (#289)
When compiling ML-KEM and checking the resulting binary for side-channel
leakage, several false positive UDIVs appear on ARM assembly. This is a
mild annoyance, but not security-relevant. This PR updates
module-lattice and ml-kem to avoid the division operators entirely.
As an added bonus, `cargo bench` reports a performance win:
| Operation | master | this branch | Δ | p-value |
|-------------|----------|----------|---------|---------|
| keygen | 31.00 µs | 26.57 µs | −14.08% | < 0.05 |
| encapsulate | 27.80 µs | 22.78 µs | −18.80% | < 0.05 |
| decapsulate | 34.48 µs | 26.18 µs | −23.30% | < 0.05 |
| round_trip | 99.46 µs | 82.35 µs | −17.27% | < 0.05 |
## Raw criterion output
### master (baseline)
```
keygen time: [30.917 µs 31.003 µs 31.115 µs]
encapsulate time: [27.637 µs 27.802 µs 28.046 µs]
decapsulate time: [34.279 µs 34.479 µs 34.778 µs]
round_trip time: [99.161 µs 99.463 µs 99.854 µs]
```
### ml-kem-undivided (compared against master)
```
keygen time: [26.493 µs 26.574 µs 26.691 µs]
change: [−14.429% −14.079% −13.765%] (p = 0.00 < 0.05)
Performance has improved.
encapsulate time: [22.478 µs 22.781 µs 23.228 µs]
change: [−19.472% −18.797% −17.936%] (p = 0.00 < 0.05)
Performance has improved.
decapsulate time: [26.089 µs 26.185 µs 26.304 µs]
change: [−23.952% −23.304% −22.512%] (p = 0.00 < 0.05)
Performance has improved.
round_trip time: [81.947 µs 82.345 µs 83.057 µs]
change: [−17.604% −17.269% −16.761%] (p = 0.00 < 0.05)
Performance has improved.
```
## Claude's Interpretation
> [!NOTE]
> Take this with a grain of salt, but it does sound plausible.
- **NTT const-generic layers** (`ntt_layer<LEN, ITERATIONS>` /
`ntt_inverse_layer<LEN, ITERATIONS>`) are the dominant win. With `LEN`
and
`ITERATIONS` compile-time constants, the inner loops unroll completely
and
LLVM auto-vectorizes the butterfly into NEON (`add.8h`, `sub.8h`,
`cmhs.8h`,
`bic.8h`). In the original form, `(0..256).step_by(2 * len)` carried a
runtime `UDIV` and blocked unrolling through the outer `for len in
[...]`.
- **Decapsulate benefits the most (−23%)** because it runs both `ntt`
and
`ntt_inverse` on the length-`K` vector and also hits the `D = 12`
`byte_decode` path.
- **Keygen (−14%)** mainly benefits from the forward NTT on the secret
and
error vectors.
- **Encapsulate (−19%)** benefits from the forward NTT on the randomness
vector
and matrix-vector product in the NTT domain.1 parent 5b84cfb commit 584f581
3 files changed
Lines changed: 69 additions & 29 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
134 | 134 | | |
135 | 135 | | |
136 | 136 | | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
137 | 158 | | |
138 | 159 | | |
139 | 160 | | |
140 | 161 | | |
141 | 162 | | |
142 | 163 | | |
143 | | - | |
144 | 164 | | |
145 | | - | |
146 | | - | |
147 | | - | |
148 | | - | |
149 | | - | |
150 | | - | |
151 | | - | |
152 | | - | |
153 | | - | |
154 | | - | |
155 | | - | |
156 | | - | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
157 | 173 | | |
158 | 174 | | |
159 | 175 | | |
| |||
175 | 191 | | |
176 | 192 | | |
177 | 193 | | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
178 | 215 | | |
179 | 216 | | |
180 | 217 | | |
181 | 218 | | |
182 | 219 | | |
183 | 220 | | |
184 | | - | |
185 | 221 | | |
186 | | - | |
187 | | - | |
188 | | - | |
189 | | - | |
190 | | - | |
191 | | - | |
192 | | - | |
193 | | - | |
194 | | - | |
195 | | - | |
196 | | - | |
197 | | - | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
198 | 230 | | |
199 | 231 | | |
200 | 232 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
72 | 72 | | |
73 | 73 | | |
74 | 74 | | |
75 | | - | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
76 | 82 | | |
77 | 83 | | |
78 | 84 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
130 | 130 | | |
131 | 131 | | |
132 | 132 | | |
133 | | - | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
134 | 136 | | |
135 | | - | |
| 137 | + | |
136 | 138 | | |
137 | 139 | | |
138 | 140 | | |
| |||
0 commit comments