-
Notifications
You must be signed in to change notification settings - Fork 93
Pull requests: NVIDIA/NVSentinel
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
feat: support for dcgm embedded mode and external hostengine mode in gpu-health-monitor(WIP)
area/ci
area/deployment
area/docs
area/health-monitors
size/XL
#1429
opened Jun 27, 2026 by
deesharma24
Contributor
Loading…
18 tasks
chore: add a tutorial on how to write a new health monitor
area/docs
size/XL
#1428
opened Jun 26, 2026 by
nitz2407
Contributor
Loading…
8 of 17 tasks
fix: default kernel-origin syslog checks to SYSLOG_FACILITY=0
area/health-monitors
size/L
#1426
opened Jun 26, 2026 by
jackyliusohu
Loading…
8 tasks done
feat: prevent DCGM connectivity errors on node bootstrapping
area/deployment
area/docs
area/tests
size/XL
#1425
opened Jun 25, 2026 by
natherz97
Contributor
Loading…
8 of 18 tasks
[WIP]fix: statemanageer label handling
area/fault-management
area/tests
size/XL
#1421
opened Jun 25, 2026 by
XRFXLP
Member
Loading…
18 tasks
feat(janitor): implement ExternalRemediationRequest reconciler
area/deployment
area/docs
area/tests
size/XL
#1392
opened Jun 11, 2026 by
jtschelling
Contributor
Loading…
4 tasks done
feat(monitor): system-services-monitor implementation + unit tests
#1382
opened Jun 10, 2026 by
dmvevents
Loading…
2 of 5 tasks
ci: register system-services-monitor in build/lint/publish matrices
#1381
opened Jun 10, 2026 by
dmvevents
Loading…
docs(design): ADR-030 — system-services-monitor scope
#1380
opened Jun 10, 2026 by
dmvevents
Loading…
feat(mcp-server): merge donation of k8s-gpu-mcp-server
#1333
opened May 25, 2026 by
ArangoGutierrez
Contributor
Loading…
feat: store maintenance CR references in annotations
area/fault-management
needs-rebase
size/XL
#1327
opened May 22, 2026 by
alexscjundev
Loading…
7 of 18 tasks
feat: device platform connector
#1198
opened Apr 16, 2026 by
pteranodan
Contributor
Loading…
8 of 18 tasks
feat: support component-based recommended actions
#1124
opened Apr 7, 2026 by
alexscjundev
Loading…
6 of 18 tasks
feat: add MongoDB Atlas support via system TLS and URI passthrough
#1069
opened Mar 23, 2026 by
drubinstein
Loading…
2 of 18 tasks
Initial stub code for k8s datastore provider
#970
opened Mar 7, 2026 by
yavinash007
Contributor
•
Draft
18 tasks
feat: add fabric-manager-monitor for GPU infrastructure health checks
#891
opened Feb 20, 2026 by
dmvevents
Loading…
4 of 6 tasks
feat: cloud-native GPU health event management
#795
opened Feb 4, 2026 by
ArangoGutierrez
Contributor
•
Draft
5 tasks done
WIP - k8s API for healtheventwithstatus model
#640
opened Dec 23, 2025 by
yavinash007
Contributor
•
Draft
2 of 18 tasks
feat: azure maintenance events
#382
opened Nov 20, 2025 by
jtschelling
Contributor
•
Draft
8 of 18 tasks
ProTip!
Adding no:label will show everything without a label.