Add pytorch-cuda-fix-xpu-alignment agent skill#3466
Add pytorch-cuda-fix-xpu-alignment agent skill#3466laifenxiawucha wants to merge 42 commits intomainfrom
Conversation
Adds a new agentskills.io-compatible skill package that mines
pytorch/pytorch for backend-divergence bug fixes and generates
minimal Python reproducers for XPU nightly validation.
Skill layout:
SKILL.md - instructions, ranking rubric, guardrails
scripts/ - find_xpu_python, run_collect_env,
run_with_xpu_python, update_torch_xpu_nightly
references/ - GitHub MCP query patterns, local validation guide
There was a problem hiding this comment.
Pull request overview
Skill used: xpu-ops-pr-review.
Adds a new agentskills.io-compatible skill package (pytorch-cuda-xpu-triage) intended to help an agent mine pytorch/pytorch CUDA-fix signals and generate minimal XPU reproducers for local nightly validation.
Changes:
- Introduces a new skill definition (
SKILL.md) plus workflow references for GitHub MCP querying and local XPU validation. - Adds helper bash scripts to (a) locate an XPU-capable Python, (b) run scripts /
collect_envwith it, and (c) install/upgrade XPU nightly wheels.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| .github/skills/pytorch-cuda-xpu-triage/SKILL.md | Defines the triage/mining workflow, ranking rubric, output format, and guardrails. |
| .github/skills/pytorch-cuda-xpu-triage/references/github-mcp-reference.md | Documents GitHub MCP setup and query patterns for mining upstream fixes. |
| .github/skills/pytorch-cuda-xpu-triage/references/local-xpu-validation-reference.md | Provides a checklist and suggested boilerplate for validating repros on XPU nightly. |
| .github/skills/pytorch-cuda-xpu-triage/scripts/find_xpu_python.sh | Auto-detects an interpreter by probing candidates for torch.xpu support. |
| .github/skills/pytorch-cuda-xpu-triage/scripts/run_with_xpu_python.sh | Runs an arbitrary Python script with the detected interpreter. |
| .github/skills/pytorch-cuda-xpu-triage/scripts/run_collect_env.sh | Runs torch.utils.collect_env with the detected interpreter. |
| .github/skills/pytorch-cuda-xpu-triage/scripts/update_torch_xpu_nightly.sh | Upgrades/install torch/vision/audio from the XPU nightly wheel index with optional locking and dry-run. |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a new GitHub Copilot skill to mine recent CUDA/backend fixes in pytorch/pytorch, qualify XPU-relevant candidates, and generate local XPU repro/validation steps.
Changes:
- Introduces
pytorch-cuda-xpu-triageskill definition and end-to-end workflow guidance. - Adds reference docs for GitHub MCP search patterns and local XPU validation commands/criteria.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| .github/skills/pytorch-cuda-xpu-triage/SKILL.md | Defines the skill, search/qualification rubric, and expected output/handoff format. |
| .github/skills/pytorch-cuda-xpu-triage/references/github-mcp-reference.md | Documents read-only GitHub search/query patterns and a narrowing heuristic. |
| .github/skills/pytorch-cuda-xpu-triage/references/local-xpu-validation-reference.md | Documents local XPU validation assumptions, commands, and bug confirmation criteria. |
Stonepia
left a comment
There was a problem hiding this comment.
Overall LGTM, we could quick enable for the init stage
|
Next step should land this skills. Currently, this skill only working with local OpenCode/Codex env |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a new “pytorch-cuda-xpu-triage” skill to guide discovery of upstream CUDA/backend fixes in pytorch/pytorch and produce minimal local XPU repro/validation steps.
Changes:
- Adds a skill definition (
SKILL.md) describing search strategy, qualification gates, and output format. - Adds GitHub MCP query/tooling reference guidance.
- Adds local XPU validation and evidence collection reference guidance.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| .github/skills/pytorch-cuda-xpu-triage/SKILL.md | Defines the triage skill’s workflow, candidate rubric, and required guardrails. |
| .github/skills/pytorch-cuda-xpu-triage/references/github-mcp-reference.md | Documents GitHub search/tool patterns to discover and narrow candidates. |
| .github/skills/pytorch-cuda-xpu-triage/references/local-xpu-validation-reference.md | Provides local XPU run/validation checklists and environment/evidence collection steps. |
Performance outliers, please check!
|
Performance outliers, please check!
|
|
Overall seems ok for now. I have the concern about the pipeline:
|
…ub-mcp-reference - Rename skill folder from pytorch-cuda-xpu-triage to pytorch-cuda-fix-xpu-alignment (this is fix discovery + alignment, not issue triage) - Delete github-mcp-reference.md; inline the example query into SKILL.md Step 1 - Update copilot-instructions.md skill index entry - Update SKILL.md frontmatter and description
After validating a bug on XPU, determine whether the fix belongs in pytorch/pytorch (upstream XPU kernel) or intel/torch-xpu-ops (local kernel). Add routing statistics to the summary.
Remove dispatch-coverage.md and triage-patterns.md — the domain knowledge they contained (coverage signals, triage patterns, key interpretations) is either already in SKILL.md's workflow steps or something the agent should derive from code. Rewrite batch-scan-workflow.md: strip JSON schemas, shell command examples, and Markdown templates. Keep only workflow steps and constraints. Inline the 'where to look' file paths from dispatch-coverage.md into SKILL.md Step 2. Before: 4 files, 342 lines After: 2 files, 112 lines (comparable to PR #3466's 109 lines)
Summary
Add a new agent skill (
pytorch-cuda-fix-xpu-alignment) that discovers CUDA and other backend bug fixes inpytorch/pytorchand validates whether the same bugs exist on XPU.Input / Output
Input: A time window (e.g., "last 1 day") and target repository (
pytorch/pytorch).Output: A local
triage_scan_<date>.mdfile containing:Workflow example
User: "Scan pytorch/pytorch for CUDA bug fixes in the last 1 day and check if they affect XPU."
The skill then:
pytorch/pytorchvia GitHub MCP for recent backend bug-fix signals (issues, PRs, commits). Save a lightweight candidate list totriage_scan_<date>.md.torch.xpuWhat is included
.github/skills/pytorch-cuda-fix-xpu-alignment/SKILL.md.github/skills/pytorch-cuda-fix-xpu-alignment/references/local-xpu-validation-reference.md.github/copilot-instructions.md