StillMe Rollback Guide

Overview

This guide provides comprehensive instructions for rolling back StillMe deployments across different environments and deployment strategies.

🔄 Rollback Strategies

Rollback Types

Application Rollback: Rollback to previous application version
Configuration Rollback: Rollback to previous configuration
Database Rollback: Rollback database changes
Infrastructure Rollback: Rollback infrastructure changes

Rollback Triggers

Health Check Failures: Continuous health check failures
High Error Rate: Error rate > 5%
Performance Degradation: P95 latency > 1000ms
Security Incidents: Security breach or compromise
Manual Override: Manual rollback decision

🐳 Docker Rollback

Quick Rollback

# Rollback to previous version
make rollback TAG=v1.2.3

# Or use rollback script directly
./scripts/rollback.sh --tag v1.2.3

Manual Docker Rollback

1. Identify Current Version

# Check current running version
docker ps --filter "name=stillme" --format "table {{.Image}}"

# Check available versions
docker images stillme --format "table {{.Tag}}"

2. Stop Current Container

# Stop current container
docker stop stillme-prod

# Remove current container
docker rm stillme-prod

3. Start Previous Version

# Start previous version
docker run -d \
  --name stillme-prod \
  --restart unless-stopped \
  -p 8080:8080 \
  -v ./data:/app/data \
  -v ./logs:/app/logs \
  stillme:v1.2.3

# Verify rollback
curl http://localhost:8080/healthz

Docker Compose Rollback

1. Update docker-compose.yml

# docker-compose.prod.yml
services:
  stillme-prod:
    image: stillme:v1.2.3  # Previous version
    # ... rest of configuration

2. Deploy Previous Version

# Deploy previous version
docker-compose -f docker-compose.prod.yml up -d --force-recreate

# Verify rollback
docker-compose -f docker-compose.prod.yml ps
curl http://localhost:8080/healthz

☸️ Kubernetes Rollback

Deployment Rollback

1. Check Deployment History

# Check deployment history
kubectl rollout history deployment/stillme -n stillme

# Check specific revision
kubectl rollout history deployment/stillme --revision=2 -n stillme

2. Rollback to Previous Version

# Rollback to previous version
kubectl rollout undo deployment/stillme -n stillme

# Rollback to specific revision
kubectl rollout undo deployment/stillme --to-revision=2 -n stillme

3. Verify Rollback

# Check rollout status
kubectl rollout status deployment/stillme -n stillme

# Check pod status
kubectl get pods -n stillme

# Check service health
kubectl port-forward svc/stillme 8080:8080 -n stillme
curl http://localhost:8080/healthz

Blue-Green Rollback

1. Switch Traffic Back

# Switch traffic back to blue
kubectl patch service stillme -p '{"spec":{"selector":{"version":"blue"}}}'

# Verify traffic switch
kubectl get svc stillme -n stillme -o yaml

2. Scale Down Green

# Scale down green deployment
kubectl scale deployment stillme-green --replicas=0 -n stillme

# Verify green is scaled down
kubectl get pods -n stillme

Canary Rollback

1. Pause Canary

# Pause canary rollout
kubectl patch rollout stillme -p '{"spec":{"paused":true}}' -n stillme

# Check rollout status
kubectl get rollout stillme -n stillme

2. Rollback Canary

# Rollback canary to stable version
kubectl patch rollout stillme -p '{"spec":{"rollbackTo":{"revision":1}}}' -n stillme

# Verify rollback
kubectl rollout status rollout/stillme -n stillme

🔧 Automated Rollback

Rollback Script

Script Usage

# Basic rollback
./scripts/rollback.sh --tag v1.2.3

# Rollback with namespace
./scripts/rollback.sh --tag v1.2.3 --namespace production

# Rollback with service name
./scripts/rollback.sh --tag v1.2.3 --service stillme-api

# Rollback with custom health check URL
./scripts/rollback.sh --tag v1.2.3 --url http://api.stillme.ai/healthz

Script Options

Option	Description	Default
`-t, --tag`	Previous tag to rollback to	Required
`-n, --namespace`	Kubernetes namespace	`stillme`
`-s, --service`	Service name	`stillme-prod`
`-u, --url`	Health check URL	`http://localhost:8080/healthz`
`-h, --help`	Show help message	-

GitHub Actions Rollback

Manual Rollback Workflow

# .github/workflows/rollback.yml
name: Manual Rollback

on:
  workflow_dispatch:
    inputs:
      environment:
        description: 'Environment to rollback'
        required: true
        default: 'production'
        type: choice
        options:
        - production
        - staging
      tag:
        description: 'Tag to rollback to'
        required: true
        type: string

jobs:
  rollback:
    runs-on: ubuntu-latest
    environment: ${{ github.event.inputs.environment }}
    
    steps:
    - uses: actions/checkout@v4
    
    - name: Rollback deployment
      run: |
        ./scripts/rollback.sh --tag ${{ github.event.inputs.tag }} --namespace ${{ github.event.inputs.environment }}
    
    - name: Verify rollback
      run: |
        curl -f http://localhost:8080/healthz
        curl -f http://localhost:8080/readyz

Trigger Rollback

# Trigger rollback via GitHub CLI
gh workflow run rollback.yml -f environment=production -f tag=v1.2.3

# Or via GitHub UI
# Go to Actions > Manual Rollback > Run workflow

📊 Rollback Verification

Health Checks

1. Service Health

# Check liveness probe
curl http://localhost:8080/healthz

# Check readiness probe
curl http://localhost:8080/readyz

# Check metrics endpoint
curl http://localhost:8080/metrics

2. Application Health

# Check application logs
kubectl logs -f deployment/stillme -n stillme

# Check for errors
kubectl logs deployment/stillme -n stillme | grep ERROR

# Check resource usage
kubectl top pods -n stillme

Performance Verification

1. Load Testing

# Run load tests
make load-test

# Check performance metrics
curl http://localhost:8080/metrics | grep http_request_duration

2. SLO Compliance

# Check SLO compliance
curl http://localhost:8080/metrics | grep -E "(p95|error_rate|availability)"

Security Verification

1. Security Headers

# Check security headers
curl -I http://localhost:8080/ | grep -E "(X-|Strict-|Content-Security)"

2. Security Scans

# Run security scans
make security

# Check security compliance
curl http://localhost:8080/security/status

🚨 Emergency Rollback

Emergency Procedures

1. Immediate Rollback

# Emergency rollback script
./scripts/emergency_rollback.sh

# Or manual emergency rollback
kubectl rollout undo deployment/stillme -n stillme
kubectl rollout status deployment/stillme -n stillme

2. Kill Switch Activation

# Activate kill switch
curl -X POST http://localhost:8080/security/kill-switch/activate

# Verify kill switch
curl http://localhost:8080/security/kill-switch/status

3. Service Isolation

# Scale down service
kubectl scale deployment stillme --replicas=0 -n stillme

# Or delete service
kubectl delete deployment stillme -n stillme

Emergency Contacts

On-Call Engineer: +1-XXX-XXX-XXXX
Security Team: security@stillme.ai
Management: management@stillme.ai
Incident Response: incident@stillme.ai

📋 Rollback Checklist

Pre-Rollback Checklist

Identify Issue: Document the issue requiring rollback
Assess Impact: Determine scope and impact of rollback
Notify Team: Inform relevant team members
Backup Data: Ensure data is backed up
Test Rollback: Test rollback procedure in staging
Prepare Rollback: Identify target version for rollback

Rollback Execution Checklist

Stop Traffic: Stop traffic to affected service
Execute Rollback: Run rollback procedure
Verify Health: Check service health endpoints
Test Functionality: Verify core functionality
Monitor Metrics: Watch key performance metrics
Check Logs: Review application logs for errors

Post-Rollback Checklist

Verify Rollback: Confirm rollback was successful
Monitor System: Monitor system for 24-48 hours
Document Incident: Document incident and rollback
Root Cause Analysis: Investigate root cause
Update Procedures: Update rollback procedures if needed
Team Communication: Communicate status to team

🔍 Troubleshooting

Common Rollback Issues

1. Rollback Fails

# Check deployment status
kubectl get deployment stillme -n stillme

# Check pod status
kubectl get pods -n stillme

# Check events
kubectl get events -n stillme --sort-by='.lastTimestamp'

# Check logs
kubectl logs deployment/stillme -n stillme

2. Health Checks Fail

# Check health endpoints
curl -v http://localhost:8080/healthz
curl -v http://localhost:8080/readyz

# Check service configuration
kubectl describe service stillme -n stillme

# Check ingress
kubectl describe ingress stillme -n stillme

3. Performance Issues

# Check resource usage
kubectl top pods -n stillme
kubectl top nodes

# Check metrics
curl http://localhost:8080/metrics

# Check for resource limits
kubectl describe pod -l app=stillme -n stillme

Debug Commands

# Debug pod
kubectl debug pod/stillme-xxx -n stillme

# Port forward for debugging
kubectl port-forward svc/stillme 8080:8080 -n stillme

# Execute commands in pod
kubectl exec -it deployment/stillme -n stillme -- /bin/bash

# Check configuration
kubectl describe configmap stillme-config -n stillme
kubectl describe secret stillme-secrets -n stillme

📚 Best Practices

Rollback Best Practices

Automated Rollback: Implement automated rollback triggers
Health Checks: Comprehensive health check validation
Monitoring: Real-time monitoring during rollback
Documentation: Document all rollback procedures
Testing: Regular rollback testing in staging

Prevention Best Practices

Staging Testing: Thorough testing in staging environment
Canary Deployments: Use canary deployments for risky changes
Feature Flags: Use feature flags for gradual rollouts
Monitoring: Comprehensive monitoring and alerting
Backup Strategy: Regular backups and recovery testing

Communication Best Practices

Incident Communication: Clear incident communication
Status Updates: Regular status updates during rollback
Post-Incident Review: Conduct post-incident reviews
Lessons Learned: Document and share lessons learned
Team Training: Regular team training on rollback procedures

🔗 Additional Resources

Documentation

Tools

Support

Documentation: docs/
Issues: GitHub Issues
Security: SECURITY.md
Community: GitHub Discussions

Last Updated: $(date) Next Review: $(date -d "+3 months") Maintainer: StillMe DevOps Team

FilesExpand file tree

ROLLBACK_GUIDE.md

Latest commit

History