Skip to content

aurthurm/beak-amr

Repository files navigation

Python AMR

Production-grade antimicrobial resistance (AMR) surveillance toolkit for Python 3.14+

Python AMR is a complete native Python port of the R AMR package, delivering high-performance AMR analytics without runtime R dependencies. Built on async architecture, DuckDB, and Polars, it provides clinical microbiologists, epidemiologists, and data engineers with enterprise-ready tools for resistance surveillance, antibiogram generation, and predictive analytics.

Python 3.14+ License: GPL-2.0


Table of Contents


Overview

Python AMR provides comprehensive antimicrobial resistance analytics through three interfaces:

  1. CLI (amr command) - For scripting and automation
  2. REST API (FastAPI) - For integration with web applications and services
  3. Python Library - For programmatic access in notebooks and applications

All interfaces share the same high-performance core engine with async persistence, DuckDB-accelerated analytics, and production-grade observability.


Key Features

Clinical Analytics

  • SIR Interpretation - EUCAST/CLSI guideline-based susceptibility interpretation
  • Antibiogram Generation - Automated resistance profiles with confidence intervals
  • Breakpoint Queries - Clinical and epidemiological breakpoint lookup
  • MDRO Detection - Multi-drug resistant organism screening (EUCAST guidelines)
  • First Isolate Selection - Episode-based deduplication for surveillance
  • Resistance Prediction - Time-series forecasting with ARIMA/exponential smoothing

Data Normalization

  • Microorganism Codes - 75,000+ organisms with taxonomy, SNOMED, and prevalence data
  • Antimicrobial Codes - 465+ antibiotics with ATC, LOINC, PubChem, and DDDs
  • Antiviral Codes - 85+ antivirals with LOINC codes
  • Fuzzy Matching - Intelligent text extraction and normalization

Production Infrastructure

  • Async Persistence - Non-blocking run storage with queue-based workers
  • DuckDB Engine - Columnar analytics for high-throughput queries
  • Dead-letter Queue - Automatic failure capture with replay capability
  • Observability - Prometheus-compatible metrics, structured logging
  • Run Auditing - Full lineage tracking with metadata and provenance

Standards Compliance

  • EUCAST Guidelines - v10-15 with ~1,300 interpretive rules
  • CLSI Guidelines - 2010-2025 breakpoints (interpretive rules not available)
  • WHONET Format - Import/export compatibility
  • Veterinary Support - Animal host breakpoints (2,436 veterinary rules)
  • Multi-language - 28-language translation support

Architecture Highlights

┌─────────────────────────────────────────────────────────────┐
│                    Interfaces Layer                         │
│  ┌──────────────┬──────────────────┬────────────────────┐  │
│  │  CLI (Typer) │  API (FastAPI)   │  Python Library    │  │
│  └──────────────┴──────────────────┴────────────────────┘  │
└─────────────────────────────────────────────────────────────┘
                           │
┌─────────────────────────────────────────────────────────────┐
│                     Core Engine                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  amr.core - Domain Logic                            │   │
│  │  • SIR interpretation • Antibiograms • Analytics    │   │
│  │  • MO/AB/AV normalization • MDRO detection          │   │
│  │  • Breakpoint resolution • Predictive models        │   │
│  └─────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
                           │
┌─────────────────────────────────────────────────────────────┐
│                   Data & Persistence                        │
│  ┌──────────────────┬─────────────────────────────────┐    │
│  │  Reference Data  │  Run Storage                    │    │
│  │  • Parquet       │  • SQLAlchemy (SQL metadata)    │    │
│  │  • NDJSON        │  • DuckDB (columnar analytics)  │    │
│  │  • Polars        │  • Async queue workers          │    │
│  └──────────────────┴─────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘

Technology Stack

  • Python 3.14 - Latest performance improvements and typing features
  • Polars - High-performance DataFrame library (10-100x faster than pandas)
  • DuckDB - OLAP database for analytical queries
  • SQLAlchemy 2.0 - Async ORM for metadata persistence
  • FastAPI - Modern async web framework
  • Typer - CLI with rich help and validation
  • Pydantic 2.0 - Data validation and serialization

Installation

Prerequisites

  • Python 3.14.x (required)
  • 4GB+ RAM recommended for large datasets
  • 1GB+ disk space for reference data

Install from Source

# Clone repository
git clone https://github.com/beak-insights/AMR.git
cd AMR/python-amr

# Create virtual environment
python3.14 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install with development dependencies
pip install -e ".[dev]"

# Verify installation
amr --version
pytest tests/unit/test_interpretation.py -v

Install Optional Dependencies

# For XLSX import support
pip install pandas openpyxl

# For PostgreSQL persistence
pip install asyncpg

# For advanced plotting
pip install seaborn plotly

Quick Start

CLI Usage

# Interpret susceptibility results
echo '[{"value": "S"}, {"value": "R"}]' | \
  amr sir - --microorganism B_ESCHR_COLI --antimicrobial AMX --guideline "EUCAST 2025"

# Generate antibiogram from CSV
amr antibiogram data/isolates.csv \
  --pathogen-column organism \
  --antimicrobial-columns AMX,CIP,GEN \
  --minimum 30

# Predict resistance trends
amr predict-resistance trends.csv \
  --date-column collection_date \
  --sir-column amoxicillin_sir \
  --frequency monthly \
  --forecast-periods 6

# Normalize microorganism names
amr mo-normalize '["E. coli", "Staph aureus", "Pseudomonas"]'

# Query clinical breakpoints
amr breakpoints-query \
  --guideline "EUCAST 2025" \
  --mo B_ESCHR_COLI \
  --ab CIP \
  --method MIC

API Usage

Start the server:

source .venv/bin/activate
PYTHONPATH=src uvicorn amr.api.app:app --host 0.0.0.0 --port 8000

Make requests:

# Health check
curl http://localhost:8000/health

# SIR interpretation
curl -X POST http://localhost:8000/v1/sir/interpret \
  -H 'Content-Type: application/json' \
  -d '{
    "values": ["S", "I", "R"],
    "microorganism": "B_ESCHR_COLI",
    "antimicrobial": "CIP",
    "guideline": "EUCAST 2025",
    "persist_run": true,
    "persist_metadata": {"source": "lab_system", "batch_id": "20260215"}
  }'

# Generate antibiogram
curl -X POST http://localhost:8000/v1/antibiogram/compute \
  -H 'Content-Type: application/json' \
  -d '{
    "rows": [...],
    "pathogen_column": "organism",
    "antimicrobial_columns": ["AMX", "CIP"],
    "minimum": 30
  }'

# Query run history
curl -X POST http://localhost:8000/v1/runs/list \
  -H 'Content-Type: application/json' \
  -d '{"limit": 20, "run_type": "sir"}'

Python Library

from amr.core.interpretation import as_sir
from amr.core.antibiogram import antibiogram
from amr.core.mo import as_mo
from amr.core.ab_properties import ab_name
import polars as pl

# Interpret susceptibility
results = as_sir(
    values=["S", "I", "R"],
    mo="B_ESCHR_COLI",
    ab="CIP",
    guideline="EUCAST 2025"
)
print(results)  # ["S", "I", "R"]

# Generate antibiogram
df = pl.read_csv("isolates.csv")
abg = antibiogram(
    df,
    pathogen_column="organism",
    antimicrobial_columns=["AMX", "CIP", "GEN"],
    minimum=30
)
print(abg)

# Normalize microorganism
mo_code = as_mo("E. coli")
print(mo_code)  # "B_ESCHR_COLI"

# Get antimicrobial properties
name = ab_name("AMX")
print(name)  # "Amoxicillin"

Core Capabilities

1. SIR Interpretation

Interpret MIC/disk diffusion results using clinical breakpoints:

from amr.core.interpretation import as_sir

# From MIC values
as_sir(
    values=[0.5, 2, 16],
    mo="B_ESCHR_COLI",
    ab="CIP",
    guideline="EUCAST 2025",
    method="MIC"
)
# Returns: ["S", "S", "R"]

# From disk diffusion zones
as_sir(
    values=[25, 18, 12],
    mo="B_STAP_AURE",
    ab="OXA",
    guideline="EUCAST 2025",
    method="disk"
)
# Returns: ["S", "I", "R"]

# With EUCAST interpretive rules
as_sir(
    values=[0.25],
    mo="B_ESCHR_COLI",
    ab="MEM",
    guideline="EUCAST 2025",
    add_intrinsic_resistance=True,
    interpretive_rules="EUCAST"
)

2. Antibiogram Generation

Create resistance profiles with statistical measures:

from amr.core.antibiogram import antibiogram
import polars as pl

df = pl.DataFrame({
    "patient_id": [1, 2, 3, 4, 5],
    "organism": ["E. coli", "E. coli", "K. pneumoniae", "E. coli", "K. pneumoniae"],
    "AMX": ["R", "S", "R", "R", "R"],
    "CIP": ["S", "S", "R", "S", "I"],
    "GEN": ["S", "S", "S", "S", "S"]
})

abg = antibiogram(
    df,
    pathogen_column="organism",
    antimicrobial_columns=["AMX", "CIP", "GEN"],
    minimum=2,  # Minimum isolates per pathogen
    combine_SI=False
)
print(abg)

Output:

┌─────────────────┬──────┬─────┬─────┬─────┐
│ microorganism   ┆ AMX  ┆ CIP ┆ GEN │     │
│ ---             ┆ ---  ┆ --- ┆ --- │     │
│ str             ┆ f64  ┆ f64 ┆ f64 │     │
╞═════════════════╪══════╪═════╪═════╡     │
│ E. coli         ┆ 66.7 ┆ 0.0 ┆ 0.0 │     │
│ K. pneumoniae   ┆ 100  ┆ 50.0┆ 0.0 │     │
└─────────────────┴──────┴─────┴─────┘

3. Microorganism Normalization

Standardize organism names with fuzzy matching:

from amr.core.mo import as_mo, mo_name, mo_taxonomy

# Normalize names
as_mo("E. coli")          # "B_ESCHR_COLI"
as_mo("Staph aureus")     # "B_STAP_AURE"
as_mo("MRSA")             # "B_STAP_AURE"

# Get properties
mo_name("B_ESCHR_COLI")   # "Escherichia coli"
mo_taxonomy("B_ESCHR_COLI", "genus")  # "Escherichia"
mo_taxonomy("B_ESCHR_COLI", "family") # "Enterobacteriaceae"

4. MDRO Detection

Screen for multi-drug resistant organisms:

from amr.core.mdro import mdro
import polars as pl

df = pl.DataFrame({
    "patient": [1, 2],
    "AMX": ["R", "S"],
    "CIP": ["R", "S"],
    "GEN": ["R", "S"],
    "MEM": ["R", "S"]
})

# EUCAST exceptional phenotypes
results = mdro(df, guideline="EUCAST")
print(results)  # ["Pos", "Neg"]

5. Resistance Prediction

Forecast future resistance trends:

from amr.core.prediction import resistance_predict
import polars as pl
from datetime import date

df = pl.DataFrame({
    "date": [date(2024, i, 1) for i in range(1, 13)],
    "sir": ["S"]*6 + ["R"]*6
})

forecast = resistance_predict(
    df,
    col_date="date",
    col_sir="sir",
    model="ARIMA",
    forecast_periods=6,
    frequency="monthly"
)
print(forecast)

Data Pipeline

Python AMR uses a canonical NDJSON-based data pipeline with quality assurance built in.

Pipeline Architecture

External Sources → Import → Canonical NDJSON → Transforms → Snapshots (Parquet)
   (TSV/CSV/      (Scripts)   (data-raw/        (Python)    (data/snapshots/)
    XLSX/RDA)                  sources/)

Reference Datasets

Dataset Records Description
microorganisms 75,000+ Taxonomic data, prevalence, SNOMED codes
antimicrobials 465+ Antibiotics with ATC, LOINC, PubChem, DDDs
antivirals 85+ Antivirals with LOINC codes
clinical_breakpoints 50,000+ EUCAST/CLSI breakpoints (2010-2025)
interpretive_rules 1,300+ EUCAST expert rules (v10-15)
intrinsic_resistant 10,000+ Natural resistance combinations
translations 28 langs Multi-language support

Pipeline Commands

# Bootstrap NDJSON from snapshots (one-time setup)
PYTHONPATH=src python scripts/export_raw_sources.py

# Import external formats to NDJSON
PYTHONPATH=src python scripts/import_raw_sources.py

# Run full transformation pipeline
PYTHONPATH=src python scripts/run_data_pipeline.py

# Validate schemas
amr validate-schemas

# Check data quality
PYTHONPATH=src python scripts/check_snapshot_raw_parity.py
PYTHONPATH=src python scripts/data_qa_report.py

Data Refresh

Update reference data from authoritative sources:

# Refresh LOINC codes
PYTHONPATH=src python scripts/refresh_loinc.py

# Refresh SNOMED CT codes
PYTHONPATH=src python scripts/refresh_snomed.py

# Refresh PubChem data (slow, ~10 min)
PYTHONPATH=src python scripts/refresh_pubchem.py

# Run all refresh pipelines
PYTHONPATH=src python scripts/refresh_all_data.py --dry-run

See Data Refresh Guide for detailed instructions.


Documentation

Comprehensive documentation organized by audience:

Getting Started

Architecture & Design

API & CLI Reference

Data Management

Operations & Deployment

Testing & Quality

Domain Knowledge

Contributing


Development

Setup Development Environment

# Clone and install
git clone https://github.com/beak-insights/AMR.git
cd AMR/python-amr
python3.14 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

# Run linters
ruff check src tests scripts
ruff format src tests scripts

# Type checking
mypy src

# Run tests
pytest -v
pytest tests/unit -v
pytest tests/api -v
pytest tests/parity -v

Local CI Gate

Run the complete quality gate locally:

./scripts/run_local_ci.sh

This runs:

  1. Code formatting (ruff)
  2. Type checking (mypy)
  3. Unit tests (pytest)
  4. API integration tests
  5. Parity checks (raw/snapshot consistency)
  6. Performance regression checks

Project Structure

python-amr/
├── src/amr/                    # Package source
│   ├── api/                    # FastAPI application
│   ├── cli/                    # Typer CLI commands
│   ├── core/                   # Domain logic
│   ├── data/                   # Reference data and pipelines
│   │   ├── ingest/            # Multi-format import
│   │   ├── pipeline/          # Transform orchestration
│   │   ├── transforms/        # Dataset transformations
│   │   ├── qa/                # Quality assurance
│   │   └── refresh/           # Data refresh pipelines
│   ├── repositories/          # Persistence layer
│   ├── engines/               # DuckDB helpers
│   └── compat/                # R compatibility aliases
├── tests/                      # Test suites
│   ├── unit/                  # Unit tests
│   ├── api/                   # API integration tests
│   ├── parity/                # R compatibility tests
│   ├── golden/                # Regression tests
│   └── perf/                  # Performance tests
├── scripts/                    # Operational scripts
├── data/                       # Runtime data
│   ├── snapshots/             # Parquet datasets
│   └── manifests/             # Metadata and QA reports
├── data-raw/                   # Canonical sources
│   ├── sources/               # NDJSON datasets
│   └── external/              # Import staging
├── docs/                       # Documentation
└── benchmarks/                 # Performance baselines

Testing

Python AMR has comprehensive test coverage with multiple test types:

Test Categories

# Unit tests - Core logic (fast)
pytest tests/unit -v

# API tests - Integration tests (medium)
pytest tests/api -v

# Parity tests - R AMR compatibility (slow)
pytest tests/parity -v -m "not slow"

# Golden tests - Regression protection
pytest tests/golden -v

# Performance tests - Throughput guardrails
pytest tests/perf -v

Code Coverage

# Generate coverage report
pytest --cov=amr --cov-report=html --cov-report=term
open htmlcov/index.html

Continuous Integration

GitHub Actions workflow runs on every push:

  • Linting (ruff)
  • Type checking (mypy)
  • Full test suite
  • Data pipeline parity checks
  • Performance regression detection

See .github/workflows/python-amr-ci.yml for details.


Performance

Python AMR is optimized for high-throughput scenarios:

Benchmarks

Operation Records Time Throughput
SIR interpretation 100,000 0.8s 125k/sec
Antibiogram 10,000 0.3s 33k/sec
MO normalization 50,000 1.2s 42k/sec
Breakpoint query 1,000 0.05s 20k/sec

Performance Features

  • Polars DataFrames - Parallel execution with lazy evaluation
  • DuckDB Analytics - Columnar storage with vectorized execution
  • Async I/O - Non-blocking persistence
  • Worker Pools - Configurable parallelism
  • Batch Processing - Automatic batching for large datasets

Tuning

# Increase async workers
export AMR_PERSIST_QUEUE_WORKERS=8

# Increase queue size
export AMR_PERSIST_QUEUE_MAXSIZE=8192

# Adjust retry behavior
export AMR_PERSIST_RETRY_MAX_RETRIES=5
export AMR_PERSIST_RETRY_BACKOFF_MS=100

See Performance Guardrails for detailed tuning.


Port Status

Python AMR is a complete port of the R AMR package with all critical features implemented.

Implemented Features

Feature Status Notes
SIR interpretation Complete EUCAST/CLSI breakpoints
Antibiogram Complete Full statistical measures
MO/AB/AV normalization Complete 75k+ organisms, 465+ antibiotics
MDRO detection Complete EUCAST guidelines
First isolate Complete Episode-based deduplication
Breakpoint queries Complete 50k+ breakpoints
EUCAST rules Complete ~1,300 interpretive rules
CLSI breakpoints Complete 2010-2025 data
Veterinary support Complete 2,436 animal breakpoints
Translation Complete 28 languages
Resistance prediction Complete ARIMA/exponential smoothing
Data refresh Partial LOINC, SNOMED, PubChem (not taxonomy)

Known Limitations

  • CLSI interpretive rules - Not available (see CLSI Support)
  • Taxonomy refresh - Framework only, merge logic pending
  • ATC code refresh - Not yet implemented
  • WHONET code refresh - Not yet implemented

See Port Status for complete compatibility matrix.


Contributing

We welcome contributions! Please see our contributing guidelines:

  1. Code contributions - Follow PEP 8, add tests, update docs
  2. Documentation - See Contributing Docs
  3. Bug reports - Use GitHub issues with reproducible examples
  4. Feature requests - Discuss in issues before implementation

Development Workflow

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes with tests
  4. Run local CI gate (./scripts/run_local_ci.sh)
  5. Commit with descriptive messages
  6. Push and create a pull request

License

This project is licensed under the GNU General Public License v2.0 - see the LICENSE file for details.

This is the same license as the R AMR package to maintain compatibility.


Acknowledgments

R AMR Package Authors

This Python port is based on the excellent R AMR package created by:

  • Matthijs S. Berends (maintainer)
  • Christian F. Luz
  • Alexander W. Friedrich
  • Bhanu N. M.
  • Casper J. Albers
  • And many other contributors

Python Port

  • Implementation: Claude (Anthropic) under guidance from Beak Insights team
  • Validation: Parity testing against R AMR package outputs
  • Infrastructure: FastAPI, Polars, DuckDB, SQLAlchemy communities

Data Sources

  • EUCAST - European Committee on Antimicrobial Susceptibility Testing
  • CLSI - Clinical and Laboratory Standards Institute
  • LOINC - Logical Observation Identifiers Names and Codes
  • SNOMED CT - Systematized Nomenclature of Medicine
  • PubChem - National Library of Medicine
  • GBIF - Global Biodiversity Information Facility
  • LPSN - List of Prokaryotic names with Standing in Nomenclature

Support


Version: 0.1.0 Python: 3.14+ Last Updated: 2026-02-15 Status: Production Ready (with documented limitations)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages