helm: fix vendor.dcgm.enabled: false has no effect#214
Open
shravyavorugallu wants to merge 1 commit into
Open
Conversation
Fixes SlinkyProject#210 The vendor.dcgm.enabled helper used Helm's ternary function via a pipe: ($vendor | dig "nvidia" "dcgm" "enabled" false) | ternary "true" "" Helm's ternary receives the piped value as the condition argument. When the piped value is a Go boolean false (as returned by dig when vendor.nvidia.dcgm.enabled is explicitly set to false), Sprig's ternary implementation reflects the value and can evaluate it as truthy, causing the DCGM prolog/epilog configmap refs to always be injected into the Controller CR regardless of the setting. This breaks CPU-only clusters (minikube, non-GPU nodes) because the operator tries to mount DCGM prolog scripts that were never created. Fix: replace ternary with a native Go template if block. The native if correctly evaluates boolean false as falsy in all Helm versions: {{- if ($vendor | dig "nvidia" "dcgm" "enabled" false) -}} true {{- end -}} The rendered output of vendor.dcgm.enabled is unchanged (returns "true" when enabled, "" when disabled), so all callsites in controller-cr.yaml and the configmap templates are unaffected. Signed-off-by: Shravya Vorugallu <shravyavorugallu@gmail.com>
Contributor
|
Good morning, Your issue indicates that this bug is present on {{/*
Check if DCGM integration is enabled
*/}}
{{- define "vendor.dcgm.enabled" -}}
{{- .Values.vendor.nvidia.dcgm.enabled -}}
{{- end }}The logic in If you can confirm that that commit does resolve the behavior that you are seeing, I can look into backporting that commit to Best, |
Author
|
Hi Vivian,
Thanks for the clarification, that helps.
I can confirm that the behavior I’m seeing does reproduce on v1.1.0, where
setting "vendor.dcgm.enabled: false" has no effect due to the logic you
pointed out.
I’ve also reviewed commit "f1786e4" on "main", and it does appear to
address the issue. Based on that, a backport to "release-1.0" and
"release-1.1" would be great.
Thankyou,
Shravya Vorugallu.
…On Tue, 16 Jun, 2026, 09:09 Vivian Hafener, ***@***.***> wrote:
*vivian-hafener* left a comment (SlinkyProject/slurm-operator#214)
<#214 (comment)>
Good morning,
Your issue indicates that this bug is present on v1.1.0, but the root
cause you present as problematic is only present on main. On v1.1.0, the
logic is as follows
<https://github.com/SlinkyProject/slurm-operator/blob/v1.1.0/helm/slurm/templates/vendor/nvidia/dcgm/_helpers.tpl#L7C1-L12C11>
:
{{/*Check if DCGM integration is enabled*/}}
{{- define "vendor.dcgm.enabled" -}}
{{- .Values.vendor.nvidia.dcgm.enabled -}}
{{- end }}
The logic in v1.1.0 was broken, which did result in vendor.dcgm.enabled:
false having no effect. However, I believe that this was resolved by
f1786e4
<f1786e4>
on main.
If you can confirm that that commit does resolve the behavior that you are
seeing, I can look into backporting that commit to release-1.0 and
release-1.1.
Best,
Vivian Hafener
—
Reply to this email directly, view it on GitHub
<#214?email_source=notifications&email_token=APKSJNDTFBAFUFGT67UQ2YD5AFWLRA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTINZSGA4DINBVGIY2M4TFMFZW63VGMF2XI2DPOKSWK5TFNZ2KYZTPN52GK4S7MNWGSY3L#issuecomment-4720844521>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APKSJNHUEX35G2L56OU4JRL5AFWLRAVCNFSNUABFKJSXA33TNF2G64TZHM4DQMZYGQ2TONZXHNEXG43VMU5TINRXGEYDAOBTHE32C5QC>
.
Triage notifications, keep track of coding agent tasks and review pull
requests on the go with GitHub Mobile for iOS
<https://github.com/notifications/mobile/ios/APKSJNFTOU3ZFSJKI4F3YAL5AFWLRA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTINZSGA4DINBVGIY2M4TFMFZW63VGMF2XI2DPOKSWK5TFNZ2KUZTPN52GK4S7NFXXG>
and Android
<https://github.com/notifications/mobile/android/APKSJNFYPSMRML3ECOJIBY35AFWLRA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTINZSGA4DINBVGIY2M4TFMFZW63VGMF2XI2DPOKSWK5TFNZ2K4ZTPN52GK4S7MFXGI4TPNFSA>.
Download it today!
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #210
Root Cause
The
vendor.dcgm.enabledhelper in_helpers.tplused Sprig'sternaryfunction via a pipeline:{{- ($vendor | dig "nvidia" "dcgm" "enabled" false) | ternary "true" "" -}}When
vendor.nvidia.dcgm.enabled: falseis set,digreturns a Gobool(false). Sprig'sternaryreflects this value and can evaluate it as truthy, causing DCGM prolog/epilog ConfigMap refs to always be injected into the Controller CR.This breaks CPU-only deployments (minikube, non-GPU nodes) because the operator tries to reconcile DCGM prolog scripts that were never created.
Fix
Replace
ternarywith a native Go templateifblock, which correctly evaluates booleanfalseas falsy in all Helm versions:{{- if ($vendor | dig "nvidia" "dcgm" "enabled" false) -}} true {{- end -}}The rendered output is unchanged:
"true"when enabled,""when disabled.