Python AMR

Production-grade antimicrobial resistance (AMR) surveillance toolkit for Python 3.14+

Python AMR is a complete native Python port of the R AMR package, delivering high-performance AMR analytics without runtime R dependencies. Built on async architecture, DuckDB, and Polars, it provides clinical microbiologists, epidemiologists, and data engineers with enterprise-ready tools for resistance surveillance, antibiogram generation, and predictive analytics.

Overview

Python AMR provides comprehensive antimicrobial resistance analytics through three interfaces:

CLI (amr command) - For scripting and automation
REST API (FastAPI) - For integration with web applications and services
Python Library - For programmatic access in notebooks and applications

All interfaces share the same high-performance core engine with async persistence, DuckDB-accelerated analytics, and production-grade observability.

Key Features

Clinical Analytics

SIR Interpretation - EUCAST/CLSI guideline-based susceptibility interpretation
Antibiogram Generation - Automated resistance profiles with confidence intervals
Breakpoint Queries - Clinical and epidemiological breakpoint lookup
MDRO Detection - Multi-drug resistant organism screening (EUCAST guidelines)
First Isolate Selection - Episode-based deduplication for surveillance
Resistance Prediction - Time-series forecasting with ARIMA/exponential smoothing

Data Normalization

Microorganism Codes - 75,000+ organisms with taxonomy, SNOMED, and prevalence data
Antimicrobial Codes - 465+ antibiotics with ATC, LOINC, PubChem, and DDDs
Antiviral Codes - 85+ antivirals with LOINC codes
Fuzzy Matching - Intelligent text extraction and normalization

Production Infrastructure

Async Persistence - Non-blocking run storage with queue-based workers
DuckDB Engine - Columnar analytics for high-throughput queries
Dead-letter Queue - Automatic failure capture with replay capability
Observability - Prometheus-compatible metrics, structured logging
Run Auditing - Full lineage tracking with metadata and provenance

Standards Compliance

EUCAST Guidelines - v10-15 with ~1,300 interpretive rules
CLSI Guidelines - 2010-2025 breakpoints (interpretive rules not available)
WHONET Format - Import/export compatibility
Veterinary Support - Animal host breakpoints (2,436 veterinary rules)
Multi-language - 28-language translation support

Architecture Highlights

┌─────────────────────────────────────────────────────────────┐
│                    Interfaces Layer                         │
│  ┌──────────────┬──────────────────┬────────────────────┐  │
│  │  CLI (Typer) │  API (FastAPI)   │  Python Library    │  │
│  └──────────────┴──────────────────┴────────────────────┘  │
└─────────────────────────────────────────────────────────────┘
                           │
┌─────────────────────────────────────────────────────────────┐
│                     Core Engine                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  amr.core - Domain Logic                            │   │
│  │  • SIR interpretation • Antibiograms • Analytics    │   │
│  │  • MO/AB/AV normalization • MDRO detection          │   │
│  │  • Breakpoint resolution • Predictive models        │   │
│  └─────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
                           │
┌─────────────────────────────────────────────────────────────┐
│                   Data & Persistence                        │
│  ┌──────────────────┬─────────────────────────────────┐    │
│  │  Reference Data  │  Run Storage                    │    │
│  │  • Parquet       │  • SQLAlchemy (SQL metadata)    │    │
│  │  • NDJSON        │  • DuckDB (columnar analytics)  │    │
│  │  • Polars        │  • Async queue workers          │    │
│  └──────────────────┴─────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘

Technology Stack

Python 3.14 - Latest performance improvements and typing features
Polars - High-performance DataFrame library (10-100x faster than pandas)
DuckDB - OLAP database for analytical queries
SQLAlchemy 2.0 - Async ORM for metadata persistence
FastAPI - Modern async web framework
Typer - CLI with rich help and validation
Pydantic 2.0 - Data validation and serialization

Installation

Prerequisites

Python 3.14.x (required)
4GB+ RAM recommended for large datasets
1GB+ disk space for reference data

Install from Source

# Clone repository
git clone https://github.com/beak-insights/AMR.git
cd AMR/python-amr

# Create virtual environment
python3.14 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install with development dependencies
pip install -e ".[dev]"

# Verify installation
amr --version
pytest tests/unit/test_interpretation.py -v

Install Optional Dependencies

# For XLSX import support
pip install pandas openpyxl

# For PostgreSQL persistence
pip install asyncpg

# For advanced plotting
pip install seaborn plotly

Quick Start

CLI Usage

# Interpret susceptibility results
echo '[{"value": "S"}, {"value": "R"}]' | \
  amr sir - --microorganism B_ESCHR_COLI --antimicrobial AMX --guideline "EUCAST 2025"

# Generate antibiogram from CSV
amr antibiogram data/isolates.csv \
  --pathogen-column organism \
  --antimicrobial-columns AMX,CIP,GEN \
  --minimum 30

# Predict resistance trends
amr predict-resistance trends.csv \
  --date-column collection_date \
  --sir-column amoxicillin_sir \
  --frequency monthly \
  --forecast-periods 6

# Normalize microorganism names
amr mo-normalize '["E. coli", "Staph aureus", "Pseudomonas"]'

# Query clinical breakpoints
amr breakpoints-query \
  --guideline "EUCAST 2025" \
  --mo B_ESCHR_COLI \
  --ab CIP \
  --method MIC

API Usage

Start the server:

source .venv/bin/activate
PYTHONPATH=src uvicorn amr.api.app:app --host 0.0.0.0 --port 8000

Make requests:

# Health check
curl http://localhost:8000/health

# SIR interpretation
curl -X POST http://localhost:8000/v1/sir/interpret \
  -H 'Content-Type: application/json' \
  -d '{
    "values": ["S", "I", "R"],
    "microorganism": "B_ESCHR_COLI",
    "antimicrobial": "CIP",
    "guideline": "EUCAST 2025",
    "persist_run": true,
    "persist_metadata": {"source": "lab_system", "batch_id": "20260215"}
  }'

# Generate antibiogram
curl -X POST http://localhost:8000/v1/antibiogram/compute \
  -H 'Content-Type: application/json' \
  -d '{
    "rows": [...],
    "pathogen_column": "organism",
    "antimicrobial_columns": ["AMX", "CIP"],
    "minimum": 30
  }'

# Query run history
curl -X POST http://localhost:8000/v1/runs/list \
  -H 'Content-Type: application/json' \
  -d '{"limit": 20, "run_type": "sir"}'

Python Library

from amr.core.interpretation import as_sir
from amr.core.antibiogram import antibiogram
from amr.core.mo import as_mo
from amr.core.ab_properties import ab_name
import polars as pl

# Interpret susceptibility
results = as_sir(
    values=["S", "I", "R"],
    mo="B_ESCHR_COLI",
    ab="CIP",
    guideline="EUCAST 2025"
)
print(results)  # ["S", "I", "R"]

# Generate antibiogram
df = pl.read_csv("isolates.csv")
abg = antibiogram(
    df,
    pathogen_column="organism",
    antimicrobial_columns=["AMX", "CIP", "GEN"],
    minimum=30
)
print(abg)

# Normalize microorganism
mo_code = as_mo("E. coli")
print(mo_code)  # "B_ESCHR_COLI"

# Get antimicrobial properties
name = ab_name("AMX")
print(name)  # "Amoxicillin"

Core Capabilities

1. SIR Interpretation

Interpret MIC/disk diffusion results using clinical breakpoints:

from amr.core.interpretation import as_sir

# From MIC values
as_sir(
    values=[0.5, 2, 16],
    mo="B_ESCHR_COLI",
    ab="CIP",
    guideline="EUCAST 2025",
    method="MIC"
)
# Returns: ["S", "S", "R"]

# From disk diffusion zones
as_sir(
    values=[25, 18, 12],
    mo="B_STAP_AURE",
    ab="OXA",
    guideline="EUCAST 2025",
    method="disk"
)
# Returns: ["S", "I", "R"]

# With EUCAST interpretive rules
as_sir(
    values=[0.25],
    mo="B_ESCHR_COLI",
    ab="MEM",
    guideline="EUCAST 2025",
    add_intrinsic_resistance=True,
    interpretive_rules="EUCAST"
)

2. Antibiogram Generation

Create resistance profiles with statistical measures:

from amr.core.antibiogram import antibiogram
import polars as pl

df = pl.DataFrame({
    "patient_id": [1, 2, 3, 4, 5],
    "organism": ["E. coli", "E. coli", "K. pneumoniae", "E. coli", "K. pneumoniae"],
    "AMX": ["R", "S", "R", "R", "R"],
    "CIP": ["S", "S", "R", "S", "I"],
    "GEN": ["S", "S", "S", "S", "S"]
})

abg = antibiogram(
    df,
    pathogen_column="organism",
    antimicrobial_columns=["AMX", "CIP", "GEN"],
    minimum=2,  # Minimum isolates per pathogen
    combine_SI=False
)
print(abg)

Output:

┌─────────────────┬──────┬─────┬─────┬─────┐
│ microorganism   ┆ AMX  ┆ CIP ┆ GEN │     │
│ ---             ┆ ---  ┆ --- ┆ --- │     │
│ str             ┆ f64  ┆ f64 ┆ f64 │     │
╞═════════════════╪══════╪═════╪═════╡     │
│ E. coli         ┆ 66.7 ┆ 0.0 ┆ 0.0 │     │
│ K. pneumoniae   ┆ 100  ┆ 50.0┆ 0.0 │     │
└─────────────────┴──────┴─────┴─────┘

3. Microorganism Normalization

Standardize organism names with fuzzy matching:

from amr.core.mo import as_mo, mo_name, mo_taxonomy

# Normalize names
as_mo("E. coli")          # "B_ESCHR_COLI"
as_mo("Staph aureus")     # "B_STAP_AURE"
as_mo("MRSA")             # "B_STAP_AURE"

# Get properties
mo_name("B_ESCHR_COLI")   # "Escherichia coli"
mo_taxonomy("B_ESCHR_COLI", "genus")  # "Escherichia"
mo_taxonomy("B_ESCHR_COLI", "family") # "Enterobacteriaceae"

4. MDRO Detection

Screen for multi-drug resistant organisms:

from amr.core.mdro import mdro
import polars as pl

df = pl.DataFrame({
    "patient": [1, 2],
    "AMX": ["R", "S"],
    "CIP": ["R", "S"],
    "GEN": ["R", "S"],
    "MEM": ["R", "S"]
})

# EUCAST exceptional phenotypes
results = mdro(df, guideline="EUCAST")
print(results)  # ["Pos", "Neg"]

5. Resistance Prediction

Forecast future resistance trends:

from amr.core.prediction import resistance_predict
import polars as pl
from datetime import date

df = pl.DataFrame({
    "date": [date(2024, i, 1) for i in range(1, 13)],
    "sir": ["S"]*6 + ["R"]*6
})

forecast = resistance_predict(
    df,
    col_date="date",
    col_sir="sir",
    model="ARIMA",
    forecast_periods=6,
    frequency="monthly"
)
print(forecast)

Data Pipeline

Python AMR uses a canonical NDJSON-based data pipeline with quality assurance built in.

Pipeline Architecture

External Sources → Import → Canonical NDJSON → Transforms → Snapshots (Parquet)
   (TSV/CSV/      (Scripts)   (data-raw/        (Python)    (data/snapshots/)
    XLSX/RDA)                  sources/)

Reference Datasets

Dataset	Records	Description
`microorganisms`	75,000+	Taxonomic data, prevalence, SNOMED codes
`antimicrobials`	465+	Antibiotics with ATC, LOINC, PubChem, DDDs
`antivirals`	85+	Antivirals with LOINC codes
`clinical_breakpoints`	50,000+	EUCAST/CLSI breakpoints (2010-2025)
`interpretive_rules`	1,300+	EUCAST expert rules (v10-15)
`intrinsic_resistant`	10,000+	Natural resistance combinations
`translations`	28 langs	Multi-language support

Pipeline Commands

# Bootstrap NDJSON from snapshots (one-time setup)
PYTHONPATH=src python scripts/export_raw_sources.py

# Import external formats to NDJSON
PYTHONPATH=src python scripts/import_raw_sources.py

# Run full transformation pipeline
PYTHONPATH=src python scripts/run_data_pipeline.py

# Validate schemas
amr validate-schemas

# Check data quality
PYTHONPATH=src python scripts/check_snapshot_raw_parity.py
PYTHONPATH=src python scripts/data_qa_report.py

Data Refresh

Update reference data from authoritative sources:

# Refresh LOINC codes
PYTHONPATH=src python scripts/refresh_loinc.py

# Refresh SNOMED CT codes
PYTHONPATH=src python scripts/refresh_snomed.py

# Refresh PubChem data (slow, ~10 min)
PYTHONPATH=src python scripts/refresh_pubchem.py

# Run all refresh pipelines
PYTHONPATH=src python scripts/refresh_all_data.py --dry-run

See Data Refresh Guide for detailed instructions.

Documentation

Comprehensive documentation organized by audience:

Getting Started

Getting Started Guide - Installation and first steps
Core Workflows - Common use cases and examples
Migration Guide - Migrating from R AMR

Architecture & Design

System Architecture - Component overview and dependencies
Runtime Sequences - Request flow and async patterns
Storage Architecture - Database design and persistence
FastAPI Integration - API design and async handlers
DuckDB Engine - Query optimization and performance
Async Architecture - Event loops and concurrency
Data Flow - End-to-end data pipeline

API & CLI Reference

API Reference - Endpoint documentation
CLI Reference - Command-line usage
Configuration Reference - Environment variables

Data Management

Data Contracts - Schema specifications
Data Pipeline Deep Dive - Transform architecture
Data Refresh Guide - Updating reference data
External Ingest Playbook - Custom data import

Operations & Deployment

Deployment Guide - Production deployment patterns
Operations Runbook - Day-to-day operations
Incident Response - Troubleshooting and recovery
Observability Metrics - Monitoring and alerting
Security and Data Handling - Security best practices

Testing & Quality

Testing Strategy - Unit, integration, parity tests
Testing Strategy Details - Comprehensive testing guide
Performance Guardrails - Benchmarking and regression detection
Parity and Reproducibility - R AMR compatibility

Domain Knowledge

Breakpoints and Guidelines - EUCAST/CLSI explained
CLSI Support - CLSI breakpoints vs interpretive rules
Analytics Semantics - Statistical methods
Prediction Methodology - Time-series forecasting

Contributing

Contributing Docs - Documentation standards
Glossary - AMR terminology
Roadmap - Future enhancements

Development

Setup Development Environment

# Clone and install
git clone https://github.com/beak-insights/AMR.git
cd AMR/python-amr
python3.14 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

# Run linters
ruff check src tests scripts
ruff format src tests scripts

# Type checking
mypy src

# Run tests
pytest -v
pytest tests/unit -v
pytest tests/api -v
pytest tests/parity -v

Local CI Gate

Run the complete quality gate locally:

./scripts/run_local_ci.sh

This runs:

Code formatting (ruff)
Type checking (mypy)
Unit tests (pytest)
API integration tests
Parity checks (raw/snapshot consistency)
Performance regression checks

Project Structure

python-amr/
├── src/amr/                    # Package source
│   ├── api/                    # FastAPI application
│   ├── cli/                    # Typer CLI commands
│   ├── core/                   # Domain logic
│   ├── data/                   # Reference data and pipelines
│   │   ├── ingest/            # Multi-format import
│   │   ├── pipeline/          # Transform orchestration
│   │   ├── transforms/        # Dataset transformations
│   │   ├── qa/                # Quality assurance
│   │   └── refresh/           # Data refresh pipelines
│   ├── repositories/          # Persistence layer
│   ├── engines/               # DuckDB helpers
│   └── compat/                # R compatibility aliases
├── tests/                      # Test suites
│   ├── unit/                  # Unit tests
│   ├── api/                   # API integration tests
│   ├── parity/                # R compatibility tests
│   ├── golden/                # Regression tests
│   └── perf/                  # Performance tests
├── scripts/                    # Operational scripts
├── data/                       # Runtime data
│   ├── snapshots/             # Parquet datasets
│   └── manifests/             # Metadata and QA reports
├── data-raw/                   # Canonical sources
│   ├── sources/               # NDJSON datasets
│   └── external/              # Import staging
├── docs/                       # Documentation
└── benchmarks/                 # Performance baselines

Testing

Python AMR has comprehensive test coverage with multiple test types:

Test Categories

# Unit tests - Core logic (fast)
pytest tests/unit -v

# API tests - Integration tests (medium)
pytest tests/api -v

# Parity tests - R AMR compatibility (slow)
pytest tests/parity -v -m "not slow"

# Golden tests - Regression protection
pytest tests/golden -v

# Performance tests - Throughput guardrails
pytest tests/perf -v

Code Coverage

# Generate coverage report
pytest --cov=amr --cov-report=html --cov-report=term
open htmlcov/index.html

Continuous Integration

GitHub Actions workflow runs on every push:

Linting (ruff)
Type checking (mypy)
Full test suite
Data pipeline parity checks
Performance regression detection

See .github/workflows/python-amr-ci.yml for details.

Performance

Python AMR is optimized for high-throughput scenarios:

Benchmarks

Operation	Records	Time	Throughput
SIR interpretation	100,000	0.8s	125k/sec
Antibiogram	10,000	0.3s	33k/sec
MO normalization	50,000	1.2s	42k/sec
Breakpoint query	1,000	0.05s	20k/sec

Performance Features

Polars DataFrames - Parallel execution with lazy evaluation
DuckDB Analytics - Columnar storage with vectorized execution
Async I/O - Non-blocking persistence
Worker Pools - Configurable parallelism
Batch Processing - Automatic batching for large datasets

Tuning

# Increase async workers
export AMR_PERSIST_QUEUE_WORKERS=8

# Increase queue size
export AMR_PERSIST_QUEUE_MAXSIZE=8192

# Adjust retry behavior
export AMR_PERSIST_RETRY_MAX_RETRIES=5
export AMR_PERSIST_RETRY_BACKOFF_MS=100

See Performance Guardrails for detailed tuning.

Port Status

Python AMR is a complete port of the R AMR package with all critical features implemented.

Implemented Features

Feature	Status	Notes
SIR interpretation	Complete	EUCAST/CLSI breakpoints
Antibiogram	Complete	Full statistical measures
MO/AB/AV normalization	Complete	75k+ organisms, 465+ antibiotics
MDRO detection	Complete	EUCAST guidelines
First isolate	Complete	Episode-based deduplication
Breakpoint queries	Complete	50k+ breakpoints
EUCAST rules	Complete	~1,300 interpretive rules
CLSI breakpoints	Complete	2010-2025 data
Veterinary support	Complete	2,436 animal breakpoints
Translation	Complete	28 languages
Resistance prediction	Complete	ARIMA/exponential smoothing
Data refresh	Partial	LOINC, SNOMED, PubChem (not taxonomy)

Known Limitations

CLSI interpretive rules - Not available (see CLSI Support)
Taxonomy refresh - Framework only, merge logic pending
ATC code refresh - Not yet implemented
WHONET code refresh - Not yet implemented

See Port Status for complete compatibility matrix.

Contributing

We welcome contributions! Please see our contributing guidelines:

Code contributions - Follow PEP 8, add tests, update docs
Documentation - See Contributing Docs
Bug reports - Use GitHub issues with reproducible examples
Feature requests - Discuss in issues before implementation

Development Workflow

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes with tests
Run local CI gate (./scripts/run_local_ci.sh)
Commit with descriptive messages
Push and create a pull request

License

This project is licensed under the GNU General Public License v2.0 - see the LICENSE file for details.

This is the same license as the R AMR package to maintain compatibility.

Acknowledgments

R AMR Package Authors

This Python port is based on the excellent R AMR package created by:

Matthijs S. Berends (maintainer)
Christian F. Luz
Alexander W. Friedrich
Bhanu N. M.
Casper J. Albers
And many other contributors

Python Port

Implementation: Claude (Anthropic) under guidance from Beak Insights team
Validation: Parity testing against R AMR package outputs
Infrastructure: FastAPI, Polars, DuckDB, SQLAlchemy communities

Data Sources

EUCAST - European Committee on Antimicrobial Susceptibility Testing
CLSI - Clinical and Laboratory Standards Institute
LOINC - Logical Observation Identifiers Names and Codes
SNOMED CT - Systematized Nomenclature of Medicine
PubChem - National Library of Medicine
GBIF - Global Biodiversity Information Facility
LPSN - List of Prokaryotic names with Standing in Nomenclature

Support

Documentation: docs/
Issues: GitHub Issues
Discussions: GitHub Discussions

Version: 0.1.0 Python: 3.14+ Last Updated: 2026-02-15 Status: Production Ready (with documented limitations)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
benchmarks		benchmarks
data-raw		data-raw
data		data
docs		docs
scripts		scripts
src/amr		src/amr
tests		tests
.coverage		.coverage
Makefile		Makefile
README.md		README.md
ROADMAP.md		ROADMAP.md
coverage.xml		coverage.xml
pyproject.toml		pyproject.toml
test_ab_from_text_demo.py		test_ab_from_text_demo.py
test_new_ab_properties.py		test_new_ab_properties.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Python AMR

Table of Contents

Overview

Key Features

Clinical Analytics

Data Normalization

Production Infrastructure

Standards Compliance

Architecture Highlights

Technology Stack

Installation

Prerequisites

Install from Source

Install Optional Dependencies

Quick Start

CLI Usage

API Usage

Python Library

Core Capabilities

1. SIR Interpretation

2. Antibiogram Generation

3. Microorganism Normalization

4. MDRO Detection

5. Resistance Prediction

Data Pipeline

Pipeline Architecture

Reference Datasets

Pipeline Commands

Data Refresh

Documentation

Getting Started

Architecture & Design

API & CLI Reference

Data Management

Operations & Deployment

Testing & Quality

Domain Knowledge

Contributing

Development

Setup Development Environment

Local CI Gate

Project Structure

Testing

Test Categories

Code Coverage

Continuous Integration

Performance

Benchmarks

Performance Features

Tuning

Port Status

Implemented Features

Known Limitations

Contributing

Development Workflow

License

Acknowledgments

R AMR Package Authors

Python Port

Data Sources

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages