Returns the health status of the service.
Response (200 OK):
{
"status": "healthy",
"version": "0.1.0",
"checks": {
"database": true
}
}Response (503 Service Unavailable):
{
"status": "unhealthy",
"version": "0.1.0",
"checks": {
"database": false
}
}Usage:
curl http://localhost:8080/healthUse Case: Load balancer health checks, basic service availability
Returns the readiness status of the service (suitable for Kubernetes readiness probes).
Response (200 OK):
{
"ready": true,
"checks": {
"database": true
}
}Response (503 Service Unavailable):
{
"ready": false,
"checks": {
"database": false
}
}Usage:
curl http://localhost:8080/readyUse Case: Kubernetes readiness probes, deployment validation
Returns Prometheus-formatted metrics for scraping.
Response (200 OK):
# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",path="/api/pull-requests",status="200"} 42
# HELP http_request_duration_seconds HTTP request duration in seconds
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{method="GET",path="/api/pull-requests",status="200",le="0.005"} 12
http_request_duration_seconds_bucket{method="GET",path="/api/pull-requests",status="200",le="0.01"} 28
http_request_duration_seconds_bucket{method="GET",path="/api/pull-requests",status="200",le="0.025"} 38
http_request_duration_seconds_sum{method="GET",path="/api/pull-requests",status="200"} 0.856
http_request_duration_seconds_count{method="GET",path="/api/pull-requests",status="200"} 42
Usage:
curl http://localhost:8080/metricsUse Case: Prometheus scraping, metrics analysis
http_requests_total
- Type: Counter
- Labels:
method,path,status - Description: Total number of HTTP requests handled by the API
http_request_duration_seconds
- Type: Histogram
- Labels:
method,path,status - Buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0]
- Description: HTTP request duration in seconds
Request rate by endpoint:
rate(http_requests_total[5m])
Error rate:
sum(rate(http_requests_total{status=~"5.."}[5m]))
/
sum(rate(http_requests_total[5m]))
P95 latency:
histogram_quantile(0.95,
sum(rate(http_request_duration_seconds_bucket[5m])) by (le, path)
)
Slowest endpoints (P99):
topk(5,
histogram_quantile(0.99,
sum(rate(http_request_duration_seconds_bucket[5m])) by (le, path)
)
)
Add to prometheus.yml:
scrape_configs:
- job_name: 'ampel-api'
static_configs:
- targets: ['api:8080']
metrics_path: '/metrics'
scrape_interval: 15sapiVersion: v1
kind: Pod
metadata:
name: ampel-api
spec:
containers:
- name: api
image: ampel-api:latest
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 30
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 10HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8080/health || exit 1Add to fly.toml:
[metrics]
port = 8080
path = "/metrics"
[checks]
[checks.alive]
type = "http"
port = 8080
method = "get"
path = "/health"
interval = "30s"
timeout = "2s"
grace_period = "5s"Add custom business metrics in your Rust code:
use metrics::counter;
counter!("pull_requests_synced_total",
"provider" => "github",
"repository" => repo_name,
"status" => "success"
).increment(1);use metrics::histogram;
use std::time::Instant;
let start = Instant::now();
// ... perform operation
let duration = start.elapsed();
histogram!("repository_sync_duration_seconds",
"provider" => "github"
).record(duration.as_secs_f64());use metrics::gauge;
gauge!("active_repositories",
"status" => "enabled"
).set(count as f64);Keep label cardinality low to avoid excessive memory usage:
❌ Bad: Using user IDs as labels
counter!("requests_total", "user_id" => user_id.to_string())✅ Good: Aggregate by user type
counter!("requests_total", "user_type" => "premium")Follow Prometheus naming conventions:
- Use
_totalsuffix for counters - Use base unit (seconds, bytes, not ms or MB)
- Use descriptive names
- Counter: Monotonically increasing values (requests, errors)
- Gauge: Values that go up and down (memory usage, active connections)
- Histogram: Distribution of values (request duration, response size)
Always track errors with proper labels:
if let Err(e) = sync_repository(&db, repo_id).await {
counter!("sync_errors_total",
"error_type" => classify_error(&e)
).increment(1);
}-
Check metrics endpoint returns data:
curl http://localhost:8080/metrics | grep http_requests_total -
Verify Prometheus is scraping:
# Check targets page open http://localhost:9090/targets -
Check for scrape errors in Prometheus logs:
docker logs ampel-prometheus
If Prometheus is using too much memory:
-
Check label cardinality:
count({__name__=~".+"}) by (__name__) -
Identify high-cardinality metrics:
curl http://localhost:9090/api/v1/label/__name__/values
-
Review metric labels and reduce unique combinations