Skip to content

Separate webhooks to subcommand / allow webhook HA #5010

Description

@pavolloffay

Component(s)

auto-instrumentation

Is your feature request related to a problem? Please describe.

I would like to ensure high availability for the instrumentation webhook. Right now the webhook is part of the operator controller deployment with a single replica and failurePolicy: Ignore (if the webhook is not reachable the start continues).

HA is required to ensure all annotated pods in the cluster are instrumented. The HA should handle operator upgrades and cluster disruptions (node drain, failures)

The instrumentation uses a pod mutation webhook deployed as part of the operator controller.
The Kubernetes controllers are usually deployed as singletons (1 replica). If multiple instances are deployed, the leader election is used to select the master.

The webhooks can be deployed and scaled separately. The API server calls the webhook via a service that loadbalances requests to all instances.

Describe the solution you'd like

I would like to be able to configure/opt-in for the instrumentation webhook HA.

I should be able to:

  • scale webhook to 2 and more replicas

The operator should automatically:

  • define PodDisruptionBudget for the webhook/operator deployment
  • define PodAntiAffinity for the webhook/operator deployment
  • define PriorityClasses for the webhook/operator deployment

Describe alternatives you've considered

No response

Additional context

How other projects handle HA for webhooks

Cert manager

 apiVersion: operator.openshift.io/v1alpha1                                                                                                                                                                                                                                                             
  kind: CertManager
  metadata:
    name: cluster                                                                                                                                                                                                                                                                                        
  spec:                                                                                                                                                                                                                                                                                                  
    webhookConfig:                                                                                                                                                                                                                                                                                       
      overrideReplicas: 3                                                                                                                                                                                                                                                                               

Cert manager operator creates webhook object and deployment https://github.com/openshift/cert-manager-operator/blob/master/bundle/manifests/cert-manager-operator.clusterserviceversion.yaml#L461

Istio

apiVersion: sailoperator.io/v1
  kind: Istio                                                                                                                                                                                                                                                                                            
  metadata:                                                                                                                                                                                                                                                                                              
    name: default
  spec:                                                                                                                                                                                                                                                                                                  
    namespace: istio-system                                                                                                                                                                                                                                                                              
    values:                
      pilot:                                                                                                                                                                                                                                                                                             
        autoscaleEnabled: false
        replicaCount: 2

OpenShift COO operator

TODOs

Tip

React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions