Component(s)
auto-instrumentation
Is your feature request related to a problem? Please describe.
I would like to ensure high availability for the instrumentation webhook. Right now the webhook is part of the operator controller deployment with a single replica and failurePolicy: Ignore (if the webhook is not reachable the start continues).
HA is required to ensure all annotated pods in the cluster are instrumented. The HA should handle operator upgrades and cluster disruptions (node drain, failures)
The instrumentation uses a pod mutation webhook deployed as part of the operator controller.
The Kubernetes controllers are usually deployed as singletons (1 replica). If multiple instances are deployed, the leader election is used to select the master.
The webhooks can be deployed and scaled separately. The API server calls the webhook via a service that loadbalances requests to all instances.
Describe the solution you'd like
I would like to be able to configure/opt-in for the instrumentation webhook HA.
I should be able to:
- scale webhook to 2 and more replicas
The operator should automatically:
- define PodDisruptionBudget for the webhook/operator deployment
- define PodAntiAffinity for the webhook/operator deployment
- define PriorityClasses for the webhook/operator deployment
Describe alternatives you've considered
No response
Additional context
How other projects handle HA for webhooks
Cert manager
apiVersion: operator.openshift.io/v1alpha1
kind: CertManager
metadata:
name: cluster
spec:
webhookConfig:
overrideReplicas: 3
Cert manager operator creates webhook object and deployment https://github.com/openshift/cert-manager-operator/blob/master/bundle/manifests/cert-manager-operator.clusterserviceversion.yaml#L461
Istio
apiVersion: sailoperator.io/v1
kind: Istio
metadata:
name: default
spec:
namespace: istio-system
values:
pilot:
autoscaleEnabled: false
replicaCount: 2
OpenShift COO operator
TODOs
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.
Component(s)
auto-instrumentation
Is your feature request related to a problem? Please describe.
I would like to ensure high availability for the instrumentation webhook. Right now the webhook is part of the operator controller deployment with a single replica and
failurePolicy: Ignore(if the webhook is not reachable the start continues).HA is required to ensure all annotated pods in the cluster are instrumented. The HA should handle operator upgrades and cluster disruptions (node drain, failures)
The instrumentation uses a pod mutation webhook deployed as part of the operator controller.
The Kubernetes controllers are usually deployed as singletons (1 replica). If multiple instances are deployed, the leader election is used to select the master.
The webhooks can be deployed and scaled separately. The API server calls the webhook via a service that loadbalances requests to all instances.
Describe the solution you'd like
I would like to be able to configure/opt-in for the instrumentation webhook HA.
I should be able to:
The operator should automatically:
Describe alternatives you've considered
No response
Additional context
How other projects handle HA for webhooks
Cert manager
Cert manager operator creates webhook object and deployment https://github.com/openshift/cert-manager-operator/blob/master/bundle/manifests/cert-manager-operator.clusterserviceversion.yaml#L461
Istio
OpenShift COO operator
TODOs
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding
+1orme too, to help us triage it. Learn more here.