Benchmarks for string collection operations like sorting across Rust and Python implementations.
Rust has several Dataframe libraries, DBMS and Search engines that heavily rely on string sorting and intersections. Those operations mostly are implemented using conventional algorithms:
- Comparison-based Quicksort or Mergesort for sorting.
- Hash-based or Tree-based algorithms for intersections.
Assuming the compares can be accelerated with SIMD and so can be the hash functions, StringZilla could already provide a performance boost in such applications, but starting with v4 it also provides specialized algorithms for sorting and intersections. Those are directly compatible with arbitrary string-comparable collection types with a support of an indexed access to the elements.
Sorting short whitespace-delimited words from xlsum.csv on a single core, in plain byte order and in Unicode case-folded order (where ß sorts as ss).
The table fuses both index sorts and full sorts: argsort / arg_sort / lexsort_to_indices / sort_indices return only the index permutation, while sort / sorted / sort_values return the reordered sequence.
StringZilla's argsort writes that permutation into a caller-owned buffer — in Python a NumPy out= array — so it allocates nothing per call, and Strs.sorted hands back a reordered view over the shared bytes instead of copying the strings.
Every engine builds its container outside the timed region and reuses its output buffer where the API allows.
StringZilla's sort is always stable, so each competitor is configured for a stable sort where it exposes one: NumPy and pandas use kind="stable", Rust's std uses the stable sort_by_key / sort_by, and Polars keeps maintain_order on in Rust — its Python Series.sort has no stability flag, so that row runs Polars' default order.
For the case-folded column every folding sort shares one comparator, StringZilla's sz_utf8_uncased_order, so the gap measures the sort algorithm rather than differing Unicode tables; in Python that comparator is reached through functools.cmp_to_key, whose per-element proxy dispatch dominates the list.sort row.
| Library | Byte Order | Unicode Case-Folded |
|---|---|---|
| Rust | ||
stringzilla::argsort |
209.32 M compares/s | 97.34 M compares/s |
polars::DataFrame::sort |
208.12 M compares/s | — |
polars::Series::sort |
205.21 M compares/s | — |
arrow::lexsort_to_indices |
122.58 M compares/s | — |
polars::Series::arg_sort |
58.06 M compares/s | — |
std::sort_by_key |
37.46 M compares/s | 24.72 M compares/s |
| Python | ||
polars.Series.sort |
229.90 M compares/s | — |
stringzilla.Strs.argsort |
219.92 M compares/s | 105.76 M compares/s |
stringzilla.Strs.sorted |
178.71 M compares/s | 95.62 M compares/s |
pyarrow.compute.sort_indices |
72.99 M compares/s | — |
pandas.Series.sort_values |
59.44 M compares/s | — |
list.sort |
51.10 M compares/s | 10.21 M compares/s |
polars.Series.arg_sort |
31.47 M compares/s | — |
numpy.argsort |
21.65 M compares/s | — |
Measured June 17, 2026 on an Intel Xeon4 Sapphire Rapids, single-threaded (Polars pinned to one thread), sorting short words from
xlsum.csv.
See README.md for dataset information and replication instructions.