Skip to content

[CI] Enhance AI-generated PR workflow with PR description reproducers and lint failure comments#3488

Open
chuanqi129 wants to merge 1 commit intomainfrom
ci/ai-reproducer-v2
Open

[CI] Enhance AI-generated PR workflow with PR description reproducers and lint failure comments#3488
chuanqi129 wants to merge 1 commit intomainfrom
ci/ai-reproducer-v2

Conversation

@chuanqi129
Copy link
Copy Markdown
Contributor

Summary

  • Support PR description reproducer commands: AI-generated PRs can now specify UT commands to run via ```reproducer-cmd fenced code blocks in the PR description, in addition to (or instead of) test/repro/test_*.py files
  • Comment @copilot on lint failures: When Python lint or Clang format checks fail on ai_generated PRs, automatically comment @copilot with the failure details and job link
  • Only run changed reproducer files: The reproducer files step now only runs files that were added or modified in the PR (via git diff --diff-filter=AM), not all files in test/repro/
  • Split reproducer into two steps: Reproducer file tests and PR description commands are now separate workflow steps for clearer logs

Changes

  • .github/workflows/pull.yml: Lint steps get continue-on-error + comment step; reproducer check supports PR description reproducer-cmd blocks; passes pr_body_cmd output to reproducer workflow
  • .github/workflows/_linux_reproducer.yml: New pr_body_cmd input; split into reproducer-files and reproducer-cmd steps; PR description commands run from pytorch/third_party/torch-xpu-ops/

…ands and lint failure comments

- Support reproducer commands in PR description via ```reproducer-cmd fenced blocks
- Comment @copilot on lint check failures for ai_generated PRs
- Split reproducer test into two steps: changed repro files and PR description command
- Only run added/modified test/repro/ files (git diff --diff-filter=AM)
- PR description commands run from pytorch/third_party/torch-xpu-ops/
Copilot AI review requested due to automatic review settings April 27, 2026 03:40
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the CI workflows for ai_generated pull requests to (a) provide clearer automation feedback (lint failure comments) and (b) expand “reproducer” support to include commands embedded in the PR description, while reducing reproducer runtime by focusing on changed reproducer files.

Changes:

  • Add lint continue-on-error with an automated @copilot comment on lint failures for ai_generated PRs, while still failing the job afterward.
  • Extend the reproducer requirement to accept PR-description ```reproducer-cmd blocks, and pass extracted commands into the reproducer workflow.
  • Update Linux reproducer workflow to run only added/modified reproducer files and to run PR-description commands in a dedicated step.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
.github/workflows/pull.yml Adds lint-failure commenting for AI PRs; extracts PR-body reproducer commands; passes extracted commands to the reproducer workflow.
.github/workflows/_linux_reproducer.yml Accepts pr_body_cmd input; runs only changed reproducer files; adds a separate step to run PR-body reproducer commands.

Skill file(s) read: .github/skills/xpu-ops-pr-review/SKILL.md.

Comment on lines 122 to +124
- name: Write summary
if: ${{ steps.reproducer.outcome == 'success' }}
run: echo "✅ Reproducer test passed. All tests in \`test/repro/\` executed successfully." >> $GITHUB_STEP_SUMMARY
if: ${{ steps.reproducer-files.outcome != 'failure' && steps.reproducer-cmd.outcome != 'failure' }}
run: echo "✅ Reproducer test passed. All reproducer tests executed successfully." >> $GITHUB_STEP_SUMMARY
Comment on lines 96 to 111
# Check if reproducer files exist
if ! ls test/repro/test_*.py 1>/dev/null 2>&1; then
FAIL_MSG="No reproducer test found. AI-generated PRs must include at least one reproducer in \`test/repro/test_*.py\`."
else
# Validate pytest format: must contain test functions or test classes
VALID=false
if ls test/repro/test_*.py 1>/dev/null 2>&1; then
# Validate pytest format
for f in test/repro/test_*.py; do
if grep -qE '^[[:space:]]*(def test_|class Test)' "$f"; then
VALID=true
HAS_REPRO_FILES=true
else
FAIL_MSG="Reproducer \`$f\` is not in pytest format. It must contain pytest-style test functions (\`def test_...\`) or test classes (\`class Test...\`)."
break
fi
done
if [ "$VALID" = true ] && [ -z "$FAIL_MSG" ]; then
echo "All reproducer tests are valid pytest format:"
if [ "$HAS_REPRO_FILES" = true ] && [ -z "$FAIL_MSG" ]; then
echo "Found valid reproducer files:"
ls test/repro/test_*.py
fi
fi
${{ (steps.lint-python.outcome == 'failure' || steps.lint-clang.outcome == 'failure')
&& contains(github.event.pull_request.labels.*.name, 'ai_generated') }}
env:
GH_TOKEN: ${{ secrets.MERGE_TOKEN }}
Comment on lines +125 to +128
# Use delimiter for multiline output
echo "pr_cmd<<ENDOFCMD" >> "$GITHUB_OUTPUT"
echo "$PR_CMD" >> "$GITHUB_OUTPUT"
echo "ENDOFCMD" >> "$GITHUB_OUTPUT"
Comment on lines +73 to +80
- name: Get changed reproducer files
id: changed-repro
run: |
cd torch-xpu-ops
CHANGED_FILES=$(git diff --name-only --diff-filter=AM origin/${{ github.event.pull_request.base.ref }}...HEAD -- test/repro/test_*.py || true)
echo "files<<EOF" >> "$GITHUB_OUTPUT"
echo "$CHANGED_FILES" >> "$GITHUB_OUTPUT"
echo "EOF" >> "$GITHUB_OUTPUT"
Comment on lines +110 to +114
set -o pipefail
cd pytorch/third_party/torch-xpu-ops
echo "=== Running reproducer command from PR description ==="
echo "Command: $PR_BODY_CMD"
eval "$PR_BODY_CMD" 2>&1 | tee -a ${{ github.workspace }}/reproducer_output.txt
@github-actions
Copy link
Copy Markdown

Performance outliers, please check!

  • 🔴 [-1, 80%), should be regression
Category Model Target vs. Baseline [Eager] Target vs. Baseline [Inductor]
torchbench_bfloat16_training pytorch_unet 0.740260 0.708482
huggingface_bfloat16_training XLNetLMHeadModel 0.739403 0.743147
huggingface_bfloat16_training TrOCRForCausalLM 0.762826 0.743547
huggingface_float16_training TrOCRForCausalLM 0.714036 0.743744
huggingface_float16_training BartForCausalLM 0.767779 0.749210
torchbench_bfloat16_training nvidia_deeprecommender 0.728536 0.749413
huggingface_float16_training DistilBertForMaskedLM 0.727101 0.750550
torchbench_bfloat16_training mobilenet_v2 0.852048 0.753404
huggingface_bfloat16_training BartForCausalLM 0.785669 0.754752
huggingface_bfloat16_training DistilBertForMaskedLM 0.785930 0.759729
huggingface_bfloat16_training AllenaiLongformerBase 0.664278 0.761543
torchbench_bfloat16_training alexnet 0.786356 0.761758
huggingface_float16_training PLBartForCausalLM 0.753717 0.762068
huggingface_bfloat16_training LayoutLMForMaskedLM 0.748089 0.763326
huggingface_float16_training XLNetLMHeadModel 0.689178 0.764898
huggingface_bfloat16_training RobertaForCausalLM 0.791742 0.765411
huggingface_float16_training MBartForCausalLM 0.742784 0.767255
torchbench_bfloat16_training Background_Matting 0.735044 0.767519
huggingface_bfloat16_training PLBartForCausalLM 0.788660 0.769228
huggingface_float16_training LayoutLMForMaskedLM 0.712256 0.769820
huggingface_float16_training BertForMaskedLM 0.744970 0.773467
huggingface_bfloat16_training MBartForCausalLM 0.761708 0.775689
torchbench_bfloat16_training resnet50 0.792881 0.776717
huggingface_float16_training RobertaForCausalLM 0.758265 0.777810
huggingface_bfloat16_training BertForMaskedLM 0.785431 0.780042
huggingface_float16_training DistillGPT2 0.725715 0.783647
huggingface_float16_training YituTechConvBert 0.712449 0.787205
huggingface_bfloat16_training OPTForCausalLM 0.807629 0.790531
huggingface_bfloat16_training ElectraForCausalLM 0.796400 0.792203
huggingface_float16_training OPTForCausalLM 0.799433 0.793982
huggingface_bfloat16_training DistillGPT2 0.771338 0.795427
huggingface_float16_training PegasusForCausalLM 0.755863 0.796515
huggingface_float16_training ElectraForCausalLM 0.756302 0.801988
huggingface_bfloat16_training YituTechConvBert 0.744741 0.804315
huggingface_bfloat16_training PegasusForCausalLM 0.766656 0.810278
huggingface_float16_training AlbertForMaskedLM 0.780563 0.827444
huggingface_float16_training T5ForConditionalGeneration 0.788676 0.831406
huggingface_float16_training AllenaiLongformerBase 0.657059 0.834163
huggingface_float16_training T5Small 0.797810 0.849514
torchbench_bfloat16_training vgg16 0.754026 0.865334
huggingface_bfloat16_training GPT2ForSequenceClassification 0.786609 0.887845
huggingface_float16_training GPT2ForSequenceClassification 0.748403 0.891828
torchbench_bfloat16_training shufflenet_v2_x1_0 0.799036 0.908651
torchbench_bfloat16_training LearningToPaint 0.697964 0.942506
  • 🟡 [80%, 90%), may be fluctuations
Category Model Target vs. Baseline [Eager] Target vs. Baseline [Inductor]
huggingface_float16_training MegatronBertForCausalLM 0.817378 0.810391
huggingface_bfloat16_training MegatronBertForCausalLM 0.848974 0.823613
huggingface_float16_training M2M100ForConditionalGeneration 0.851691 0.838449
huggingface_float16_training BlenderbotForCausalLM 0.848699 0.850853
huggingface_bfloat16_training AlbertForMaskedLM 0.814382 0.852019
huggingface_bfloat16_training T5ForConditionalGeneration 0.832136 0.857793
huggingface_bfloat16_training T5Small 0.829986 0.860213
timm_models_bfloat16_training mobilevit_s 1.040174 0.862857
huggingface_bfloat16_training M2M100ForConditionalGeneration 0.888709 0.867597
huggingface_bfloat16_training GoogleFnet 0.837960 0.873194
huggingface_float16_training GoogleFnet 0.837320 0.873942
huggingface_float16_training DebertaV2ForMaskedLM 0.888092 0.881864
huggingface_bfloat16_training DebertaV2ForMaskedLM 0.909572 0.882427
timm_models_bfloat16_training tf_efficientnet_b0 1.051259 0.887072
huggingface_bfloat16_training BlenderbotForCausalLM 0.886350 0.897623
huggingface_float16_training XGLMForCausalLM 0.822459 0.910660
huggingface_bfloat16_training XGLMForCausalLM 0.868734 0.910866
torchbench_bfloat16_training squeezenet1_1 0.844351 1.031531

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants