What would you like to be added?
The Experiment validating webhook currently only validates batch/v1 Job trial specs in validateTrialJob(). When the trial template uses a TrainJob (or any other supported Kubeflow job kind), the function returns nil without any structural validation:
https://github.com/kubeflow/katib/blob/master/pkg/webhook/v1beta1/experiment/validator/validator.go#L440-L459
func (g *DefaultValidator) validateTrialJob(runSpec *unstructured.Unstructured) error {
gvk := runSpec.GroupVersionKind()
// Only validates batch/v1 Job — everything else is silently skipped
if gvk.GroupVersion() != batchv1.SchemeGroupVersion || gvk.Kind != consts.JobKindJob {
return nil
}
// ...
}
This means a user can submit an Experiment with a malformed TrainJob trial template, and the webhook will accept it without error. The misconfiguration only surfaces at runtime when the controller tries to create the trial.
Proposed Changes
- Add validation for
TrainJob in validateTrialJob() — when kind: TrainJob, verify the unstructured spec can be converted to a valid TrainJob structure.
- Extract
"TrainJob" hardcoded strings in constants.go and experiment_defaults.go into a named constant (JobKindTrainJob).
- Fix the copy-paste comment on
DefaultTrainJobPrimaryPodLabels (line 49 of constants.go) — the comment incorrectly says DefaultKubeflowJobPrimaryPodLabels.
Why it matters
TrainJob is already a supported trial kind in Katib (added in KubeflowJobKinds and with dedicated success/failure conditions and pod labels).
- The upcoming
OptimizationJob CRD (#2605) will exclusively use TrainJob as trial workloads, making this validation even more important.
- Catches misconfigurations at admission time rather than at runtime.
/kind feature
/area api
What would you like to be added?
The Experiment validating webhook currently only validates
batch/v1 Jobtrial specs in validateTrialJob(). When the trial template uses aTrainJob(or any other supported Kubeflow job kind), the function returnsnilwithout any structural validation:https://github.com/kubeflow/katib/blob/master/pkg/webhook/v1beta1/experiment/validator/validator.go#L440-L459
This means a user can submit an Experiment with a malformed
TrainJobtrial template, and the webhook will accept it without error. The misconfiguration only surfaces at runtime when the controller tries to create the trial.Proposed Changes
TrainJobin validateTrialJob() — whenkind: TrainJob, verify the unstructured spec can be converted to a valid TrainJob structure."TrainJob"hardcoded strings in constants.go and experiment_defaults.go into a named constant (JobKindTrainJob).DefaultTrainJobPrimaryPodLabels(line 49 of constants.go) — the comment incorrectly saysDefaultKubeflowJobPrimaryPodLabels.Why it matters
TrainJobis already a supported trial kind in Katib (added inKubeflowJobKindsand with dedicated success/failure conditions and pod labels).OptimizationJobCRD (#2605) will exclusively useTrainJobas trial workloads, making this validation even more important./kind feature
/area api