Production-grade antimicrobial resistance (AMR) surveillance toolkit for Python 3.14+
Python AMR is a complete native Python port of the R AMR package, delivering high-performance AMR analytics without runtime R dependencies. Built on async architecture, DuckDB, and Polars, it provides clinical microbiologists, epidemiologists, and data engineers with enterprise-ready tools for resistance surveillance, antibiogram generation, and predictive analytics.
- Overview
- Key Features
- Architecture Highlights
- Installation
- Quick Start
- Core Capabilities
- Data Pipeline
- Documentation
- Development
- Testing
- Performance
- Port Status
- Contributing
- License
- Acknowledgments
Python AMR provides comprehensive antimicrobial resistance analytics through three interfaces:
- CLI (
amrcommand) - For scripting and automation - REST API (FastAPI) - For integration with web applications and services
- Python Library - For programmatic access in notebooks and applications
All interfaces share the same high-performance core engine with async persistence, DuckDB-accelerated analytics, and production-grade observability.
- SIR Interpretation - EUCAST/CLSI guideline-based susceptibility interpretation
- Antibiogram Generation - Automated resistance profiles with confidence intervals
- Breakpoint Queries - Clinical and epidemiological breakpoint lookup
- MDRO Detection - Multi-drug resistant organism screening (EUCAST guidelines)
- First Isolate Selection - Episode-based deduplication for surveillance
- Resistance Prediction - Time-series forecasting with ARIMA/exponential smoothing
- Microorganism Codes - 75,000+ organisms with taxonomy, SNOMED, and prevalence data
- Antimicrobial Codes - 465+ antibiotics with ATC, LOINC, PubChem, and DDDs
- Antiviral Codes - 85+ antivirals with LOINC codes
- Fuzzy Matching - Intelligent text extraction and normalization
- Async Persistence - Non-blocking run storage with queue-based workers
- DuckDB Engine - Columnar analytics for high-throughput queries
- Dead-letter Queue - Automatic failure capture with replay capability
- Observability - Prometheus-compatible metrics, structured logging
- Run Auditing - Full lineage tracking with metadata and provenance
- EUCAST Guidelines - v10-15 with ~1,300 interpretive rules
- CLSI Guidelines - 2010-2025 breakpoints (interpretive rules not available)
- WHONET Format - Import/export compatibility
- Veterinary Support - Animal host breakpoints (2,436 veterinary rules)
- Multi-language - 28-language translation support
┌─────────────────────────────────────────────────────────────┐
│ Interfaces Layer │
│ ┌──────────────┬──────────────────┬────────────────────┐ │
│ │ CLI (Typer) │ API (FastAPI) │ Python Library │ │
│ └──────────────┴──────────────────┴────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────────────┐
│ Core Engine │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ amr.core - Domain Logic │ │
│ │ • SIR interpretation • Antibiograms • Analytics │ │
│ │ • MO/AB/AV normalization • MDRO detection │ │
│ │ • Breakpoint resolution • Predictive models │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────────────┐
│ Data & Persistence │
│ ┌──────────────────┬─────────────────────────────────┐ │
│ │ Reference Data │ Run Storage │ │
│ │ • Parquet │ • SQLAlchemy (SQL metadata) │ │
│ │ • NDJSON │ • DuckDB (columnar analytics) │ │
│ │ • Polars │ • Async queue workers │ │
│ └──────────────────┴─────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
- Python 3.14 - Latest performance improvements and typing features
- Polars - High-performance DataFrame library (10-100x faster than pandas)
- DuckDB - OLAP database for analytical queries
- SQLAlchemy 2.0 - Async ORM for metadata persistence
- FastAPI - Modern async web framework
- Typer - CLI with rich help and validation
- Pydantic 2.0 - Data validation and serialization
- Python 3.14.x (required)
- 4GB+ RAM recommended for large datasets
- 1GB+ disk space for reference data
# Clone repository
git clone https://github.com/beak-insights/AMR.git
cd AMR/python-amr
# Create virtual environment
python3.14 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install with development dependencies
pip install -e ".[dev]"
# Verify installation
amr --version
pytest tests/unit/test_interpretation.py -v# For XLSX import support
pip install pandas openpyxl
# For PostgreSQL persistence
pip install asyncpg
# For advanced plotting
pip install seaborn plotly# Interpret susceptibility results
echo '[{"value": "S"}, {"value": "R"}]' | \
amr sir - --microorganism B_ESCHR_COLI --antimicrobial AMX --guideline "EUCAST 2025"
# Generate antibiogram from CSV
amr antibiogram data/isolates.csv \
--pathogen-column organism \
--antimicrobial-columns AMX,CIP,GEN \
--minimum 30
# Predict resistance trends
amr predict-resistance trends.csv \
--date-column collection_date \
--sir-column amoxicillin_sir \
--frequency monthly \
--forecast-periods 6
# Normalize microorganism names
amr mo-normalize '["E. coli", "Staph aureus", "Pseudomonas"]'
# Query clinical breakpoints
amr breakpoints-query \
--guideline "EUCAST 2025" \
--mo B_ESCHR_COLI \
--ab CIP \
--method MICStart the server:
source .venv/bin/activate
PYTHONPATH=src uvicorn amr.api.app:app --host 0.0.0.0 --port 8000Make requests:
# Health check
curl http://localhost:8000/health
# SIR interpretation
curl -X POST http://localhost:8000/v1/sir/interpret \
-H 'Content-Type: application/json' \
-d '{
"values": ["S", "I", "R"],
"microorganism": "B_ESCHR_COLI",
"antimicrobial": "CIP",
"guideline": "EUCAST 2025",
"persist_run": true,
"persist_metadata": {"source": "lab_system", "batch_id": "20260215"}
}'
# Generate antibiogram
curl -X POST http://localhost:8000/v1/antibiogram/compute \
-H 'Content-Type: application/json' \
-d '{
"rows": [...],
"pathogen_column": "organism",
"antimicrobial_columns": ["AMX", "CIP"],
"minimum": 30
}'
# Query run history
curl -X POST http://localhost:8000/v1/runs/list \
-H 'Content-Type: application/json' \
-d '{"limit": 20, "run_type": "sir"}'from amr.core.interpretation import as_sir
from amr.core.antibiogram import antibiogram
from amr.core.mo import as_mo
from amr.core.ab_properties import ab_name
import polars as pl
# Interpret susceptibility
results = as_sir(
values=["S", "I", "R"],
mo="B_ESCHR_COLI",
ab="CIP",
guideline="EUCAST 2025"
)
print(results) # ["S", "I", "R"]
# Generate antibiogram
df = pl.read_csv("isolates.csv")
abg = antibiogram(
df,
pathogen_column="organism",
antimicrobial_columns=["AMX", "CIP", "GEN"],
minimum=30
)
print(abg)
# Normalize microorganism
mo_code = as_mo("E. coli")
print(mo_code) # "B_ESCHR_COLI"
# Get antimicrobial properties
name = ab_name("AMX")
print(name) # "Amoxicillin"Interpret MIC/disk diffusion results using clinical breakpoints:
from amr.core.interpretation import as_sir
# From MIC values
as_sir(
values=[0.5, 2, 16],
mo="B_ESCHR_COLI",
ab="CIP",
guideline="EUCAST 2025",
method="MIC"
)
# Returns: ["S", "S", "R"]
# From disk diffusion zones
as_sir(
values=[25, 18, 12],
mo="B_STAP_AURE",
ab="OXA",
guideline="EUCAST 2025",
method="disk"
)
# Returns: ["S", "I", "R"]
# With EUCAST interpretive rules
as_sir(
values=[0.25],
mo="B_ESCHR_COLI",
ab="MEM",
guideline="EUCAST 2025",
add_intrinsic_resistance=True,
interpretive_rules="EUCAST"
)Create resistance profiles with statistical measures:
from amr.core.antibiogram import antibiogram
import polars as pl
df = pl.DataFrame({
"patient_id": [1, 2, 3, 4, 5],
"organism": ["E. coli", "E. coli", "K. pneumoniae", "E. coli", "K. pneumoniae"],
"AMX": ["R", "S", "R", "R", "R"],
"CIP": ["S", "S", "R", "S", "I"],
"GEN": ["S", "S", "S", "S", "S"]
})
abg = antibiogram(
df,
pathogen_column="organism",
antimicrobial_columns=["AMX", "CIP", "GEN"],
minimum=2, # Minimum isolates per pathogen
combine_SI=False
)
print(abg)Output:
┌─────────────────┬──────┬─────┬─────┬─────┐
│ microorganism ┆ AMX ┆ CIP ┆ GEN │ │
│ --- ┆ --- ┆ --- ┆ --- │ │
│ str ┆ f64 ┆ f64 ┆ f64 │ │
╞═════════════════╪══════╪═════╪═════╡ │
│ E. coli ┆ 66.7 ┆ 0.0 ┆ 0.0 │ │
│ K. pneumoniae ┆ 100 ┆ 50.0┆ 0.0 │ │
└─────────────────┴──────┴─────┴─────┘
Standardize organism names with fuzzy matching:
from amr.core.mo import as_mo, mo_name, mo_taxonomy
# Normalize names
as_mo("E. coli") # "B_ESCHR_COLI"
as_mo("Staph aureus") # "B_STAP_AURE"
as_mo("MRSA") # "B_STAP_AURE"
# Get properties
mo_name("B_ESCHR_COLI") # "Escherichia coli"
mo_taxonomy("B_ESCHR_COLI", "genus") # "Escherichia"
mo_taxonomy("B_ESCHR_COLI", "family") # "Enterobacteriaceae"Screen for multi-drug resistant organisms:
from amr.core.mdro import mdro
import polars as pl
df = pl.DataFrame({
"patient": [1, 2],
"AMX": ["R", "S"],
"CIP": ["R", "S"],
"GEN": ["R", "S"],
"MEM": ["R", "S"]
})
# EUCAST exceptional phenotypes
results = mdro(df, guideline="EUCAST")
print(results) # ["Pos", "Neg"]Forecast future resistance trends:
from amr.core.prediction import resistance_predict
import polars as pl
from datetime import date
df = pl.DataFrame({
"date": [date(2024, i, 1) for i in range(1, 13)],
"sir": ["S"]*6 + ["R"]*6
})
forecast = resistance_predict(
df,
col_date="date",
col_sir="sir",
model="ARIMA",
forecast_periods=6,
frequency="monthly"
)
print(forecast)Python AMR uses a canonical NDJSON-based data pipeline with quality assurance built in.
External Sources → Import → Canonical NDJSON → Transforms → Snapshots (Parquet)
(TSV/CSV/ (Scripts) (data-raw/ (Python) (data/snapshots/)
XLSX/RDA) sources/)
| Dataset | Records | Description |
|---|---|---|
microorganisms |
75,000+ | Taxonomic data, prevalence, SNOMED codes |
antimicrobials |
465+ | Antibiotics with ATC, LOINC, PubChem, DDDs |
antivirals |
85+ | Antivirals with LOINC codes |
clinical_breakpoints |
50,000+ | EUCAST/CLSI breakpoints (2010-2025) |
interpretive_rules |
1,300+ | EUCAST expert rules (v10-15) |
intrinsic_resistant |
10,000+ | Natural resistance combinations |
translations |
28 langs | Multi-language support |
# Bootstrap NDJSON from snapshots (one-time setup)
PYTHONPATH=src python scripts/export_raw_sources.py
# Import external formats to NDJSON
PYTHONPATH=src python scripts/import_raw_sources.py
# Run full transformation pipeline
PYTHONPATH=src python scripts/run_data_pipeline.py
# Validate schemas
amr validate-schemas
# Check data quality
PYTHONPATH=src python scripts/check_snapshot_raw_parity.py
PYTHONPATH=src python scripts/data_qa_report.pyUpdate reference data from authoritative sources:
# Refresh LOINC codes
PYTHONPATH=src python scripts/refresh_loinc.py
# Refresh SNOMED CT codes
PYTHONPATH=src python scripts/refresh_snomed.py
# Refresh PubChem data (slow, ~10 min)
PYTHONPATH=src python scripts/refresh_pubchem.py
# Run all refresh pipelines
PYTHONPATH=src python scripts/refresh_all_data.py --dry-runSee Data Refresh Guide for detailed instructions.
Comprehensive documentation organized by audience:
- Getting Started Guide - Installation and first steps
- Core Workflows - Common use cases and examples
- Migration Guide - Migrating from R AMR
- System Architecture - Component overview and dependencies
- Runtime Sequences - Request flow and async patterns
- Storage Architecture - Database design and persistence
- FastAPI Integration - API design and async handlers
- DuckDB Engine - Query optimization and performance
- Async Architecture - Event loops and concurrency
- Data Flow - End-to-end data pipeline
- API Reference - Endpoint documentation
- CLI Reference - Command-line usage
- Configuration Reference - Environment variables
- Data Contracts - Schema specifications
- Data Pipeline Deep Dive - Transform architecture
- Data Refresh Guide - Updating reference data
- External Ingest Playbook - Custom data import
- Deployment Guide - Production deployment patterns
- Operations Runbook - Day-to-day operations
- Incident Response - Troubleshooting and recovery
- Observability Metrics - Monitoring and alerting
- Security and Data Handling - Security best practices
- Testing Strategy - Unit, integration, parity tests
- Testing Strategy Details - Comprehensive testing guide
- Performance Guardrails - Benchmarking and regression detection
- Parity and Reproducibility - R AMR compatibility
- Breakpoints and Guidelines - EUCAST/CLSI explained
- CLSI Support - CLSI breakpoints vs interpretive rules
- Analytics Semantics - Statistical methods
- Prediction Methodology - Time-series forecasting
- Contributing Docs - Documentation standards
- Glossary - AMR terminology
- Roadmap - Future enhancements
# Clone and install
git clone https://github.com/beak-insights/AMR.git
cd AMR/python-amr
python3.14 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
# Run linters
ruff check src tests scripts
ruff format src tests scripts
# Type checking
mypy src
# Run tests
pytest -v
pytest tests/unit -v
pytest tests/api -v
pytest tests/parity -vRun the complete quality gate locally:
./scripts/run_local_ci.shThis runs:
- Code formatting (ruff)
- Type checking (mypy)
- Unit tests (pytest)
- API integration tests
- Parity checks (raw/snapshot consistency)
- Performance regression checks
python-amr/
├── src/amr/ # Package source
│ ├── api/ # FastAPI application
│ ├── cli/ # Typer CLI commands
│ ├── core/ # Domain logic
│ ├── data/ # Reference data and pipelines
│ │ ├── ingest/ # Multi-format import
│ │ ├── pipeline/ # Transform orchestration
│ │ ├── transforms/ # Dataset transformations
│ │ ├── qa/ # Quality assurance
│ │ └── refresh/ # Data refresh pipelines
│ ├── repositories/ # Persistence layer
│ ├── engines/ # DuckDB helpers
│ └── compat/ # R compatibility aliases
├── tests/ # Test suites
│ ├── unit/ # Unit tests
│ ├── api/ # API integration tests
│ ├── parity/ # R compatibility tests
│ ├── golden/ # Regression tests
│ └── perf/ # Performance tests
├── scripts/ # Operational scripts
├── data/ # Runtime data
│ ├── snapshots/ # Parquet datasets
│ └── manifests/ # Metadata and QA reports
├── data-raw/ # Canonical sources
│ ├── sources/ # NDJSON datasets
│ └── external/ # Import staging
├── docs/ # Documentation
└── benchmarks/ # Performance baselines
Python AMR has comprehensive test coverage with multiple test types:
# Unit tests - Core logic (fast)
pytest tests/unit -v
# API tests - Integration tests (medium)
pytest tests/api -v
# Parity tests - R AMR compatibility (slow)
pytest tests/parity -v -m "not slow"
# Golden tests - Regression protection
pytest tests/golden -v
# Performance tests - Throughput guardrails
pytest tests/perf -v# Generate coverage report
pytest --cov=amr --cov-report=html --cov-report=term
open htmlcov/index.htmlGitHub Actions workflow runs on every push:
- Linting (ruff)
- Type checking (mypy)
- Full test suite
- Data pipeline parity checks
- Performance regression detection
See .github/workflows/python-amr-ci.yml for details.
Python AMR is optimized for high-throughput scenarios:
| Operation | Records | Time | Throughput |
|---|---|---|---|
| SIR interpretation | 100,000 | 0.8s | 125k/sec |
| Antibiogram | 10,000 | 0.3s | 33k/sec |
| MO normalization | 50,000 | 1.2s | 42k/sec |
| Breakpoint query | 1,000 | 0.05s | 20k/sec |
- Polars DataFrames - Parallel execution with lazy evaluation
- DuckDB Analytics - Columnar storage with vectorized execution
- Async I/O - Non-blocking persistence
- Worker Pools - Configurable parallelism
- Batch Processing - Automatic batching for large datasets
# Increase async workers
export AMR_PERSIST_QUEUE_WORKERS=8
# Increase queue size
export AMR_PERSIST_QUEUE_MAXSIZE=8192
# Adjust retry behavior
export AMR_PERSIST_RETRY_MAX_RETRIES=5
export AMR_PERSIST_RETRY_BACKOFF_MS=100See Performance Guardrails for detailed tuning.
Python AMR is a complete port of the R AMR package with all critical features implemented.
| Feature | Status | Notes |
|---|---|---|
| SIR interpretation | Complete | EUCAST/CLSI breakpoints |
| Antibiogram | Complete | Full statistical measures |
| MO/AB/AV normalization | Complete | 75k+ organisms, 465+ antibiotics |
| MDRO detection | Complete | EUCAST guidelines |
| First isolate | Complete | Episode-based deduplication |
| Breakpoint queries | Complete | 50k+ breakpoints |
| EUCAST rules | Complete | ~1,300 interpretive rules |
| CLSI breakpoints | Complete | 2010-2025 data |
| Veterinary support | Complete | 2,436 animal breakpoints |
| Translation | Complete | 28 languages |
| Resistance prediction | Complete | ARIMA/exponential smoothing |
| Data refresh | Partial | LOINC, SNOMED, PubChem (not taxonomy) |
- CLSI interpretive rules - Not available (see CLSI Support)
- Taxonomy refresh - Framework only, merge logic pending
- ATC code refresh - Not yet implemented
- WHONET code refresh - Not yet implemented
See Port Status for complete compatibility matrix.
We welcome contributions! Please see our contributing guidelines:
- Code contributions - Follow PEP 8, add tests, update docs
- Documentation - See Contributing Docs
- Bug reports - Use GitHub issues with reproducible examples
- Feature requests - Discuss in issues before implementation
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes with tests
- Run local CI gate (
./scripts/run_local_ci.sh) - Commit with descriptive messages
- Push and create a pull request
This project is licensed under the GNU General Public License v2.0 - see the LICENSE file for details.
This is the same license as the R AMR package to maintain compatibility.
This Python port is based on the excellent R AMR package created by:
- Matthijs S. Berends (maintainer)
- Christian F. Luz
- Alexander W. Friedrich
- Bhanu N. M.
- Casper J. Albers
- And many other contributors
- Implementation: Claude (Anthropic) under guidance from Beak Insights team
- Validation: Parity testing against R AMR package outputs
- Infrastructure: FastAPI, Polars, DuckDB, SQLAlchemy communities
- EUCAST - European Committee on Antimicrobial Susceptibility Testing
- CLSI - Clinical and Laboratory Standards Institute
- LOINC - Logical Observation Identifiers Names and Codes
- SNOMED CT - Systematized Nomenclature of Medicine
- PubChem - National Library of Medicine
- GBIF - Global Biodiversity Information Facility
- LPSN - List of Prokaryotic names with Standing in Nomenclature
- Documentation: docs/
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Version: 0.1.0 Python: 3.14+ Last Updated: 2026-02-15 Status: Production Ready (with documented limitations)