Skip to content

Filter out malformed nvidia-smi process_name XML tag#1910

Open
jfennick wants to merge 2 commits intocommon-workflow-language:mainfrom
jfennick:nvidia_smi_xml_fix
Open

Filter out malformed nvidia-smi process_name XML tag#1910
jfennick wants to merge 2 commits intocommon-workflow-language:mainfrom
jfennick:nvidia_smi_xml_fix

Conversation

@jfennick
Copy link
Copy Markdown
Contributor

No description provided.

@codecov
Copy link
Copy Markdown

codecov Bot commented Sep 19, 2023

Codecov Report

❌ Patch coverage is 94.44444% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 82.86%. Comparing base (b0c8dda) to head (239ce5a).
⚠️ Report is 287 commits behind head on main.

Files with missing lines Patch % Lines
cwltool/cuda.py 94.44% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1910      +/-   ##
==========================================
- Coverage   83.98%   82.86%   -1.12%     
==========================================
  Files          46       46              
  Lines        8190     8200      +10     
  Branches     2174     2138      -36     
==========================================
- Hits         6878     6795      -83     
- Misses        840      903      +63     
- Partials      472      502      +30     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@jfennick jfennick force-pushed the nvidia_smi_xml_fix branch 2 times, most recently from 67683da to 7945f7d Compare September 19, 2023 19:14
@jfennick jfennick changed the title Replace malformed nvidia-smi XML Filter out malformed nvidia-smi process_name XML tag Sep 19, 2023
@mr-c mr-c requested a review from tetron September 19, 2023 21:55
@jfennick jfennick force-pushed the nvidia_smi_xml_fix branch 7 times, most recently from 8c77a20 to 081818d Compare September 20, 2023 02:35
@jfennick
Copy link
Copy Markdown
Contributor Author

FYI, as I discuss in the block comment in the code, to fix the immediate problem the only change necessary is
nvidia-smi -q -x | grep -v process_name
If that's more likely to get merged sooner, then we can do that for now.

But in an attempt to prevent future problems, I have instead removed all tags except the tags that are actually used. Let me know which version you prefer.

@mr-c mr-c force-pushed the nvidia_smi_xml_fix branch 2 times, most recently from 7bb1989 to ebaeb74 Compare October 2, 2023 09:55
@mr-c mr-c enabled auto-merge (rebase) October 2, 2023 09:56
auto-merge was automatically disabled October 3, 2023 19:19

Head branch was pushed to by a user without write access

@jfennick jfennick force-pushed the nvidia_smi_xml_fix branch from db7dc3b to df408bd Compare October 3, 2023 19:25
Copy link
Copy Markdown
Member

@mr-c mr-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2023-10-03T19:27:36.5351674Z cwltool/context.py:46: initilizer ==> initializer
2023-10-03T19:27:36.5352997Z tests/test_toolargparse.py:183: desription ==> description
2023-10-03T19:27:36.5354037Z tests/test_toolargparse.py:192: desription ==> description
2023-10-03T19:27:36.5354867Z tests/wf/schemadef-bug-1473.cwl:452: cylces ==> cycles
2023-10-03T19:27:36.5441419Z Probable typo foun. Run "make codespell-fix" to accept suggested fixes, or add the word to the ignore list in setup.cfg

Comment thread cwltool/cuda.py
from .utils import CWLObjectType


def cuda_device_count() -> str:
Copy link
Copy Markdown
Member

@mr-c mr-c Oct 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def cuda_device_count() -> str:
def cuda_device_count() -> int:

and update the return as well

Comment thread cwltool/cuda.py
cmd = "nvidia-smi -q -x | grep cuda_version"
try:
out = subprocess.check_output(["nvidia-smi", "-q", "-x"]) # nosec
out = subprocess.check_output(cmd, shell=True) # nosec
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer that instead of using a subshell it captures the output and does the substring search in Python. For a simple substring search this really is as simple as something like:

out = subprocess.check_output(["nvidia-smi", "-q", "-x"])
if "cuda_version" not in out:
   raise ...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually instead of using minidom, to pull a single tag it would be easier to just use a regular expression to look for <cuda_version>.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants