Comprehensive observability stack implemented for Ampel with metrics, tracing, logging, and monitoring.
All components successfully implemented and integrated.
Files Created/Modified:
/alt/home/developer/workspace/projects/ampel/crates/ampel-api/src/observability.rs- Health checks and metrics endpoints/alt/home/developer/workspace/projects/ampel/crates/ampel-api/src/middleware/metrics.rs- HTTP metrics middleware/alt/home/developer/workspace/projects/ampel/crates/ampel-api/src/routes/mod.rs- Integrated observability routes/alt/home/developer/workspace/projects/ampel/crates/ampel-api/src/main.rs- Metrics initialization
Features:
- Prometheus metrics exporter with custom HTTP metrics
- Health check endpoint:
GET /health - Readiness check endpoint:
GET /ready - Metrics endpoint:
GET /metrics - Automatic HTTP request tracking (method, path, status, duration)
- OpenTelemetry support for distributed tracing (optional)
Dependencies Added:
metrics- Metrics abstractionmetrics-exporter-prometheus- Prometheus exportertracing-opentelemetry- OpenTelemetry integrationopentelemetry+opentelemetry_sdk- Tracing support
Files Created:
/alt/home/developer/workspace/projects/ampel/docker/docker-compose.monitoring.yml- Complete monitoring stack/alt/home/developer/workspace/projects/ampel/monitoring/prometheus.yml- Prometheus configuration/alt/home/developer/workspace/projects/ampel/monitoring/alerts/ampel.yml- Alert rules/alt/home/developer/workspace/projects/ampel/monitoring/grafana/datasources/prometheus.yml- Grafana datasource/alt/home/developer/workspace/projects/ampel/monitoring/grafana/dashboards/ampel-overview.json- Main dashboard/alt/home/developer/workspace/projects/ampel/.env.monitoring.example- Configuration template
Services:
- Prometheus (port 9090) - Metrics storage and querying
- Grafana (port 3000) - Visualization and dashboards
- PostgreSQL Exporter (port 9187) - Database metrics
- Redis Exporter (port 9121) - Cache metrics
- Loki (port 3100) - Log aggregation
Files Created:
/alt/home/developer/workspace/projects/ampel/frontend/src/components/ErrorBoundary.tsx- Error boundary component/alt/home/developer/workspace/projects/ampel/frontend/src/utils/monitoring.ts- Monitoring utilities/alt/home/developer/workspace/projects/ampel/frontend/src/main.tsx- Integrated monitoring
Features:
- React ErrorBoundary with automatic error reporting
- Web Vitals tracking (CLS, FID, LCP, FCP, TTFB)
- Custom event tracking
- Performance monitoring
- Unhandled error and promise rejection tracking
Dependencies Added:
web-vitals- Core Web Vitals measurement
Alert Rules:
- HighErrorRate - Error rate >5% for 5 minutes
- HighLatency - P95 latency >1s for 10 minutes
- DatabaseDown - PostgreSQL unavailable for 1 minute
- HighDatabaseConnections - >80 connections for 5 minutes
- ServiceDown - Service unavailable for 2 minutes
Files Created:
/alt/home/developer/workspace/projects/ampel/docs/observability.md- Complete observability guide/alt/home/developer/workspace/projects/ampel/docs/observability-quickstart.md- Quick start guide/alt/home/developer/workspace/projects/ampel/monitoring/README.md- Monitoring configuration guide
Topics Covered:
- Architecture overview
- Metrics collection and custom metrics
- Grafana dashboards
- Alert configuration
- Fly.io native monitoring integration
- Distributed tracing setup (optional)
- Logging best practices
- Production deployment checklist
File Created:
/alt/home/developer/workspace/projects/ampel/Makefile.monitoring- Monitoring-specific commands
Commands:
make monitoring-up # Start monitoring stack
make monitoring-down # Stop monitoring stack
make monitoring-logs # View logs
make monitoring-restart # Restart services
make monitoring-clean # Clean all data
make monitoring-health # Check service health
make monitoring-import-dashboards # Import Grafana dashboards
make monitoring-export-dashboards # Export Grafana dashboardshttp_requests_total{method, path, status}- Total requests counterhttp_request_duration_seconds{method, path, status}- Request duration histogram
pg_stat_database_numbackends- Active connectionspg_stat_database_xact_commit- Transaction commitspg_stat_database_xact_rollback- Transaction rollbackspg_stat_database_deadlocks- Deadlock count
redis_connected_clients- Connected clientsredis_commands_processed_total- Total commandsredis_memory_used_bytes- Memory usage
# 1. Start monitoring stack
make monitoring-up
# 2. Start Ampel services
make dev-api
make dev-worker
# 3. Access monitoring
open http://localhost:3000 # Grafana (admin/admin)
open http://localhost:9090 # Prometheus
# 4. Check health endpoints
curl http://localhost:8080/health
curl http://localhost:8080/ready
curl http://localhost:8080/metricsFly.io provides native monitoring at: https://fly.io/apps/[APP-NAME]/monitoring
Additional configuration in fly.toml:
[metrics]
port = 8080
path = "/metrics"┌─────────────────────────────────────────────────────────────┐
│ Ampel Application │
├─────────────────────────────────────────────────────────────┤
│ API (8080) Worker (8081) Frontend (Browser) │
│ ├─ /metrics ├─ /metrics ├─ ErrorBoundary │
│ ├─ /health └─ /health ├─ Web Vitals │
│ └─ /ready └─ Event Tracking │
└────────┬─────────────────┬─────────────────────┬────────────┘
│ │ │
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────┐
│ Prometheus (9090) │
│ ├─ Scrapes metrics every 15s │
│ ├─ Stores time-series data │
│ ├─ Evaluates alert rules │
│ └─ Provides PromQL query interface │
└────────┬──────────────────────────────────┬─────────────────┘
│ │
│ │
▼ ▼
┌──────────────────────┐ ┌──────────────────────────┐
│ Grafana (3000) │ │ Alertmanager (Optional) │
│ ├─ Dashboards │ │ ├─ Alert routing │
│ ├─ Visualizations │ │ ├─ Notifications │
│ └─ Alerts │ │ └─ Slack/Email/PagerDuty│
└──────────────────────┘ └──────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Data Sources │
├─────────────────────────────────────────────────────────────┤
│ Postgres Exporter (9187) Redis Exporter (9121) │
│ Loki (3100) │
└─────────────────────────────────────────────────────────────┘
- Production-Ready: Complete monitoring stack with alerts
- Developer-Friendly: Easy local setup with make commands
- Cloud-Native: Fly.io integration documented
- Comprehensive: Backend, frontend, and infrastructure metrics
- Extensible: Easy to add custom metrics and dashboards
- Best Practices: Following industry standards (Prometheus, Grafana, OpenTelemetry)
- Backend metrics endpoint
/metricsaccessible - Health checks
/healthand/readyreturn proper JSON - Prometheus scrapes metrics successfully
- Grafana dashboards display data
- Alerts are properly configured
- Frontend ErrorBoundary catches errors
- Web Vitals are tracked
- End-to-end test with all services (pending manual verification)
- Update Grafana admin password in production
- Configure Alertmanager for notifications (Slack/PagerDuty)
- Set up log aggregation (Loki or external service)
- Configure backup for Prometheus data
- Restrict metrics endpoint to monitoring network
- Enable TLS for Grafana and Prometheus
- Set up on-call rotation for alerts
- Create runbooks for common incidents
- Configure Fly.io native monitoring
- Test alert notifications
- Test Locally: Run
make monitoring-upand verify all services - Create Custom Dashboards: Add application-specific panels
- Configure Alertmanager: Set up notification channels
- Deploy to Fly.io: Test /metrics endpoint in production
- Monitor Production: Validate alerts and dashboards with real traffic
Common issues and solutions documented in:
/alt/home/developer/workspace/projects/ampel/docs/observability.md#troubleshooting/alt/home/developer/workspace/projects/ampel/monitoring/README.md#troubleshooting
Implementation Date: 2025-12-22 Status: ✅ Complete and Ready for Testing Agent: Observability Implementation Specialist