Sequence Operations Benchmarks

Benchmarks for string collection operations like sorting across Rust and Python implementations.

Overview

Rust has several Dataframe libraries, DBMS and Search engines that heavily rely on string sorting and intersections. Those operations mostly are implemented using conventional algorithms:

Comparison-based Quicksort or Mergesort for sorting.
Hash-based or Tree-based algorithms for intersections.

Assuming the compares can be accelerated with SIMD and so can be the hash functions, StringZilla could already provide a performance boost in such applications, but starting with v4 it also provides specialized algorithms for sorting and intersections. Those are directly compatible with arbitrary string-comparable collection types with a support of an indexed access to the elements.

String Sorting

Sorting short whitespace-delimited words from xlsum.csv on a single core, in plain byte order and in Unicode case-folded order (where ß sorts as ss). The table fuses both index sorts and full sorts: argsort / arg_sort / lexsort_to_indices / sort_indices return only the index permutation, while sort / sorted / sort_values return the reordered sequence. StringZilla's argsort writes that permutation into a caller-owned buffer — in Python a NumPy out= array — so it allocates nothing per call, and Strs.sorted hands back a reordered view over the shared bytes instead of copying the strings. Every engine builds its container outside the timed region and reuses its output buffer where the API allows. StringZilla's sort is always stable, so each competitor is configured for a stable sort where it exposes one: NumPy and pandas use kind="stable", Rust's std uses the stable sort_by_key / sort_by, and Polars keeps maintain_order on in Rust — its Python Series.sort has no stability flag, so that row runs Polars' default order. For the case-folded column every folding sort shares one comparator, StringZilla's sz_utf8_uncased_order, so the gap measures the sort algorithm rather than differing Unicode tables; in Python that comparator is reached through functools.cmp_to_key, whose per-element proxy dispatch dominates the list.sort row.

Intel Xeon4 Sapphire Rapids

Library	Byte Order	Unicode Case-Folded
Rust
`stringzilla::argsort`	209.32 M compares/s	97.34 M compares/s
`polars::DataFrame::sort`	208.12 M compares/s	—
`polars::Series::sort`	205.21 M compares/s	—
`arrow::lexsort_to_indices`	122.58 M compares/s	—
`polars::Series::arg_sort`	58.06 M compares/s	—
`std::sort_by_key`	37.46 M compares/s	24.72 M compares/s

Python
`polars.Series.sort`	229.90 M compares/s	—
`stringzilla.Strs.argsort`	219.92 M compares/s	105.76 M compares/s
`stringzilla.Strs.sorted`	178.71 M compares/s	95.62 M compares/s
`pyarrow.compute.sort_indices`	72.99 M compares/s	—
`pandas.Series.sort_values`	59.44 M compares/s	—
`list.sort`	51.10 M compares/s	10.21 M compares/s
`polars.Series.arg_sort`	31.47 M compares/s	—
`numpy.argsort`	21.65 M compares/s	—

Measured June 17, 2026 on an Intel Xeon4 Sapphire Rapids, single-threaded (Polars pinned to one thread), sorting short words from xlsum.csv.

See README.md for dataset information and replication instructions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sequence Operations Benchmarks

Overview

String Sorting

Intel Xeon4 Sapphire Rapids

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Sequence Operations Benchmarks

Overview

String Sorting

Intel Xeon4 Sapphire Rapids