Skip to content

Make o11y internal-only#155

Open
import-pandas-as-numpy wants to merge 1 commit into
mainfrom
chore/o11y-migration-start
Open

Make o11y internal-only#155
import-pandas-as-numpy wants to merge 1 commit into
mainfrom
chore/o11y-migration-start

Conversation

@import-pandas-as-numpy
Copy link
Copy Markdown
Member

Summary

  • move o11y ingress from the public node-IP allowlist model to an internal DigitalOcean load balancer
  • restore private CoreDNS resolution for prom, loki, and grafana in staging and prod using the internal o11y VIP
  • document and ship a manual reconciler for future internal LB VIP drift

Changes

  • switch o11y ingress-nginx service to LoadBalancer with service.beta.kubernetes.io/do-loadbalancer-network: INTERNAL
  • enable TLS for Prometheus and Loki ingress in repo state
  • align the cert-manager ClusterIssuer manifest with the live Cloudflare DNS-01 issuer
  • update staging and prod Alloy remote_write and Loki push endpoints to HTTPS
  • narrow the embedded postgres exporter collector set to avoid unsupported managed-Postgres permission paths
  • set coredns-custom for non-o11y clusters to map prom, loki, and grafana to the internal o11y VIP
  • add scripts/reconcile-o11y-internal-vip.sh and a step-by-step operator guide

Verification

  • verified the new o11y ingress Service uses internal LB 10.124.0.2
  • verified staging and prod resolve prom, loki, and grafana privately to 10.124.0.2
  • verified private HTTPS reachability from both clusters: Prometheus 200, Grafana 200, Loki reachable on /ready
  • verified Alloy logs in staging and prod are clean after cutover
  • verified o11y Prometheus still reports count(up{cluster="staging",job=~".*postgres.*"}) == 2
  • verified o11y Prometheus still reports count(up{cluster="prod",job=~".*postgres.*"}) == 2
  • verified the helper script passes shellcheck

@import-pandas-as-numpy import-pandas-as-numpy requested a review from a team as a code owner March 24, 2026 03:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant