🔥 Blazingly fast ML inference server powered by Rust and Burn framework
-
Updated
Jul 25, 2025 - Rust
🔥 Blazingly fast ML inference server powered by Rust and Burn framework
A curated list of awesome open source and commercial platforms for serving models in production 🚀
Serving large ml models independently and asynchronously via message queue and kv-storage for communication with other services [EXPERIMENT]
Collection of OSS models that are containerized into a serving container
Miscellaneous codes and writings for MLOps
Integrating Aporia ML model monitoring into a Bodywork serving pipeline.
Big ML Project with infrastructure (MLflow, Minio, Grafana), backend (FastAPI, Catboost) and frontend (React, Maplibre)
Energy consumption of ML inference with Runtime Engines
🌐 Language identification for Scandinavian languages
Applied Machine Learning Projects
Example solution to the MLOps Case Study covering both online and batch processing.
Low-latency feature store for real-time ML serving with online/offline consistency
Production ML model serving with FastAPI, Docker, Prometheus metrics & async inference
Production-oriented ML serving stack with Docker, FastAPI, Docker Compose and GitHub Actions.
Animated flow diagrams with token simulation — JSON-configured, zero framework dependency
Resources for serving models in production
Production-grade model serving layer wrapping Ollama with request batching, SSE streaming, backpressure, and Prometheus metrics. OpenAI-compatible API.
Heterogeneous System ML Pipeline Scheduling Framework with Triton Inference Server as Backend
Serve ML models via FastAPI with real-time predictions from trained classifiers.
Add a description, image, and links to the ml-serving topic page so that developers can more easily learn about it.
To associate your repository with the ml-serving topic, visit your repo's landing page and select "manage topics."