[CI] Enhance AI-generated PR workflow with PR description reproducers and lint failure comments by chuanqi129 · Pull Request #3488 · intel/torch-xpu-ops

chuanqi129 · 2026-04-27T03:40:45Z

Summary

Support PR description reproducer commands: AI-generated PRs can now specify UT commands to run via ```reproducer-cmd fenced code blocks in the PR description, in addition to (or instead of) test/repro/test_*.py files
Comment @copilot on lint failures: When Python lint or Clang format checks fail on ai_generated PRs, automatically comment @copilot with the failure details and job link
Only run changed reproducer files: The reproducer files step now only runs files that were added or modified in the PR (via git diff --diff-filter=AM), not all files in test/repro/
Split reproducer into two steps: Reproducer file tests and PR description commands are now separate workflow steps for clearer logs

Changes

.github/workflows/pull.yml: Lint steps get continue-on-error + comment step; reproducer check supports PR description reproducer-cmd blocks; passes pr_body_cmd output to reproducer workflow
.github/workflows/_linux_reproducer.yml: New pr_body_cmd input; split into reproducer-files and reproducer-cmd steps; PR description commands run from pytorch/third_party/torch-xpu-ops/

…ands and lint failure comments - Support reproducer commands in PR description via ```reproducer-cmd fenced blocks - Comment @copilot on lint check failures for ai_generated PRs - Split reproducer test into two steps: changed repro files and PR description command - Only run added/modified test/repro/ files (git diff --diff-filter=AM) - PR description commands run from pytorch/third_party/torch-xpu-ops/

Copilot

Pull request overview

This PR updates the CI workflows for ai_generated pull requests to (a) provide clearer automation feedback (lint failure comments) and (b) expand “reproducer” support to include commands embedded in the PR description, while reducing reproducer runtime by focusing on changed reproducer files.

Changes:

Add lint continue-on-error with an automated @copilot comment on lint failures for ai_generated PRs, while still failing the job afterward.
Extend the reproducer requirement to accept PR-description ```reproducer-cmd blocks, and pass extracted commands into the reproducer workflow.
Update Linux reproducer workflow to run only added/modified reproducer files and to run PR-description commands in a dedicated step.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File	Description
`.github/workflows/pull.yml`	Adds lint-failure commenting for AI PRs; extracts PR-body reproducer commands; passes extracted commands to the reproducer workflow.
`.github/workflows/_linux_reproducer.yml`	Accepts `pr_body_cmd` input; runs only changed reproducer files; adds a separate step to run PR-body reproducer commands.

Skill file(s) read: .github/skills/xpu-ops-pr-review/SKILL.md.

      - name: Write summary
-        if: ${{ steps.reproducer.outcome == 'success' }}
-        run: echo "✅ Reproducer test passed. All tests in \`test/repro/\` executed successfully." >> $GITHUB_STEP_SUMMARY
+        if: ${{ steps.reproducer-files.outcome != 'failure' && steps.reproducer-cmd.outcome != 'failure' }}
+        run: echo "✅ Reproducer test passed. All reproducer tests executed successfully." >> $GITHUB_STEP_SUMMARY


          # Check if reproducer files exist
-          if ! ls test/repro/test_*.py 1>/dev/null 2>&1; then
-            FAIL_MSG="No reproducer test found. AI-generated PRs must include at least one reproducer in \`test/repro/test_*.py\`."
-          else
-            # Validate pytest format: must contain test functions or test classes
-            VALID=false
+          if ls test/repro/test_*.py 1>/dev/null 2>&1; then
+            # Validate pytest format
            for f in test/repro/test_*.py; do
              if grep -qE '^[[:space:]]*(def test_|class Test)' "$f"; then
-                VALID=true
+                HAS_REPRO_FILES=true
              else
                FAIL_MSG="Reproducer \`$f\` is not in pytest format. It must contain pytest-style test functions (\`def test_...\`) or test classes (\`class Test...\`)."
                break
              fi
            done
-            if [ "$VALID" = true ] && [ -z "$FAIL_MSG" ]; then
-              echo "All reproducer tests are valid pytest format:"
+            if [ "$HAS_REPRO_FILES" = true ] && [ -z "$FAIL_MSG" ]; then
+              echo "Found valid reproducer files:"
              ls test/repro/test_*.py
            fi
          fi


+          ${{ (steps.lint-python.outcome == 'failure' || steps.lint-clang.outcome == 'failure')
+          && contains(github.event.pull_request.labels.*.name, 'ai_generated') }}
+        env:
+          GH_TOKEN: ${{ secrets.MERGE_TOKEN }}


+          # Use delimiter for multiline output
+          echo "pr_cmd<<ENDOFCMD" >> "$GITHUB_OUTPUT"
+          echo "$PR_CMD" >> "$GITHUB_OUTPUT"
+          echo "ENDOFCMD" >> "$GITHUB_OUTPUT"


+      - name: Get changed reproducer files
+        id: changed-repro
+        run: |
+          cd torch-xpu-ops
+          CHANGED_FILES=$(git diff --name-only --diff-filter=AM origin/${{ github.event.pull_request.base.ref }}...HEAD -- test/repro/test_*.py || true)
+          echo "files<<EOF" >> "$GITHUB_OUTPUT"
+          echo "$CHANGED_FILES" >> "$GITHUB_OUTPUT"
+          echo "EOF" >> "$GITHUB_OUTPUT"


+          set -o pipefail
+          cd pytorch/third_party/torch-xpu-ops
+          echo "=== Running reproducer command from PR description ==="
+          echo "Command: $PR_BODY_CMD"
+          eval "$PR_BODY_CMD" 2>&1 | tee -a ${{ github.workspace }}/reproducer_output.txt


github-actions · 2026-04-27T07:22:07Z

Performance outliers, please check!

🔴 [-1, 80%), should be regression

Category	Model	Target vs. Baseline [Eager]	Target vs. Baseline [Inductor]
torchbench_bfloat16_training	pytorch_unet	0.740260	0.708482
huggingface_bfloat16_training	XLNetLMHeadModel	0.739403	0.743147
huggingface_bfloat16_training	TrOCRForCausalLM	0.762826	0.743547
huggingface_float16_training	TrOCRForCausalLM	0.714036	0.743744
huggingface_float16_training	BartForCausalLM	0.767779	0.749210
torchbench_bfloat16_training	nvidia_deeprecommender	0.728536	0.749413
huggingface_float16_training	DistilBertForMaskedLM	0.727101	0.750550
torchbench_bfloat16_training	mobilenet_v2	0.852048	0.753404
huggingface_bfloat16_training	BartForCausalLM	0.785669	0.754752
huggingface_bfloat16_training	DistilBertForMaskedLM	0.785930	0.759729
huggingface_bfloat16_training	AllenaiLongformerBase	0.664278	0.761543
torchbench_bfloat16_training	alexnet	0.786356	0.761758
huggingface_float16_training	PLBartForCausalLM	0.753717	0.762068
huggingface_bfloat16_training	LayoutLMForMaskedLM	0.748089	0.763326
huggingface_float16_training	XLNetLMHeadModel	0.689178	0.764898
huggingface_bfloat16_training	RobertaForCausalLM	0.791742	0.765411
huggingface_float16_training	MBartForCausalLM	0.742784	0.767255
torchbench_bfloat16_training	Background_Matting	0.735044	0.767519
huggingface_bfloat16_training	PLBartForCausalLM	0.788660	0.769228
huggingface_float16_training	LayoutLMForMaskedLM	0.712256	0.769820
huggingface_float16_training	BertForMaskedLM	0.744970	0.773467
huggingface_bfloat16_training	MBartForCausalLM	0.761708	0.775689
torchbench_bfloat16_training	resnet50	0.792881	0.776717
huggingface_float16_training	RobertaForCausalLM	0.758265	0.777810
huggingface_bfloat16_training	BertForMaskedLM	0.785431	0.780042
huggingface_float16_training	DistillGPT2	0.725715	0.783647
huggingface_float16_training	YituTechConvBert	0.712449	0.787205
huggingface_bfloat16_training	OPTForCausalLM	0.807629	0.790531
huggingface_bfloat16_training	ElectraForCausalLM	0.796400	0.792203
huggingface_float16_training	OPTForCausalLM	0.799433	0.793982
huggingface_bfloat16_training	DistillGPT2	0.771338	0.795427
huggingface_float16_training	PegasusForCausalLM	0.755863	0.796515
huggingface_float16_training	ElectraForCausalLM	0.756302	0.801988
huggingface_bfloat16_training	YituTechConvBert	0.744741	0.804315
huggingface_bfloat16_training	PegasusForCausalLM	0.766656	0.810278
huggingface_float16_training	AlbertForMaskedLM	0.780563	0.827444
huggingface_float16_training	T5ForConditionalGeneration	0.788676	0.831406
huggingface_float16_training	AllenaiLongformerBase	0.657059	0.834163
huggingface_float16_training	T5Small	0.797810	0.849514
torchbench_bfloat16_training	vgg16	0.754026	0.865334
huggingface_bfloat16_training	GPT2ForSequenceClassification	0.786609	0.887845
huggingface_float16_training	GPT2ForSequenceClassification	0.748403	0.891828
torchbench_bfloat16_training	shufflenet_v2_x1_0	0.799036	0.908651
torchbench_bfloat16_training	LearningToPaint	0.697964	0.942506

🟡 [80%, 90%), may be fluctuations

Category	Model	Target vs. Baseline [Eager]	Target vs. Baseline [Inductor]
huggingface_float16_training	MegatronBertForCausalLM	0.817378	0.810391
huggingface_bfloat16_training	MegatronBertForCausalLM	0.848974	0.823613
huggingface_float16_training	M2M100ForConditionalGeneration	0.851691	0.838449
huggingface_float16_training	BlenderbotForCausalLM	0.848699	0.850853
huggingface_bfloat16_training	AlbertForMaskedLM	0.814382	0.852019
huggingface_bfloat16_training	T5ForConditionalGeneration	0.832136	0.857793
huggingface_bfloat16_training	T5Small	0.829986	0.860213
timm_models_bfloat16_training	mobilevit_s	1.040174	0.862857
huggingface_bfloat16_training	M2M100ForConditionalGeneration	0.888709	0.867597
huggingface_bfloat16_training	GoogleFnet	0.837960	0.873194
huggingface_float16_training	GoogleFnet	0.837320	0.873942
huggingface_float16_training	DebertaV2ForMaskedLM	0.888092	0.881864
huggingface_bfloat16_training	DebertaV2ForMaskedLM	0.909572	0.882427
timm_models_bfloat16_training	tf_efficientnet_b0	1.051259	0.887072
huggingface_bfloat16_training	BlenderbotForCausalLM	0.886350	0.897623
huggingface_float16_training	XGLMForCausalLM	0.822459	0.910660
huggingface_bfloat16_training	XGLMForCausalLM	0.868734	0.910866
torchbench_bfloat16_training	squeezenet1_1	0.844351	1.031531

Copilot AI review requested due to automatic review settings April 27, 2026 03:40

Copilot started reviewing on behalf of chuanqi129 April 27, 2026 03:41 View session

Copilot AI reviewed Apr 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Enhance AI-generated PR workflow with PR description reproducers and lint failure comments#3488

[CI] Enhance AI-generated PR workflow with PR description reproducers and lint failure comments#3488
chuanqi129 wants to merge 1 commit intomainfrom
ci/ai-reproducer-v2

chuanqi129 commented Apr 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chuanqi129 commented Apr 27, 2026

Summary

Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

github-actions Bot commented Apr 27, 2026

Performance outliers, please check!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants