CuFlash-Attn is maintained as a lean CUDA FlashAttention reference implementation. Favor fixes, refactors, documentation accuracy, and workflow simplification over feature growth.
- NVIDIA GPU with Compute Capability 7.0+ for full local validation
- CUDA Toolkit 12.x
- CMake 3.18+
- A C++17-capable compiler
- Node.js 18+ for documentation work
cmake --preset release
cmake --build --preset release
ctest --preset release --output-on-failure
find . \( -name "*.cu" -o -name "*.cuh" -o -name "*.cpp" -o -name "*.h" \) \
! -path "*/build/*" | xargs clang-format -i
cd docs
npm ci
npm run docs:build- Preserve the current CUDA API surface and supported
head_dimvalues:32,64,128. - Prefer deleting stale workflow layers, duplicated docs, and unused scaffolding over extending them.
- Keep documentation aligned with the real repository structure and supported behavior.
- Use CMake presets; do not introduce parallel build systems or AI control frameworks.
- Work directly on the current branch unless a maintainer asks for a different flow.
- Keep GitHub Pages focused on product documentation, not process artifacts.
- Record project history only in the root
CHANGELOG.md. - If a page duplicates repository content without adding user value, remove or consolidate it.
clang-formatwith the repository.clang-format- namespaces
lower_case, classesCamelCase, functionslower_case - return
FlashAttentionErrorfrom public APIs instead of throwing exceptions
Before opening a PR, confirm:
- The build/test commands still make sense for the current environment.
- Documentation and workflow files describe what the repository actually does today.
- No deleted or deprecated process files are still linked from README, docs, or workflows.