Skip to content

Automated GitHub workflow to review card-script PRs for errors#10984

Draft
MostCromulent wants to merge 5 commits into
Card-Forge:masterfrom
MostCromulent:card-script-review-ci
Draft

Automated GitHub workflow to review card-script PRs for errors#10984
MostCromulent wants to merge 5 commits into
Card-Forge:masterfrom
MostCromulent:card-script-review-ci

Conversation

@MostCromulent

Copy link
Copy Markdown
Contributor

Proof of concept — the workflow commenting on deliberately-broken cards on a green build: MostCromulent#23

Summary

Adds a GitHub Actions workflow to check card-script PRs for errors: a deterministic linter that flags mechanical errors in the script (undefined references, duplicate or unknown params, malformed syntax), plus a Scryfall fact-check that compares the card frame to the printed card. The workflow posts inline comments, does not block merge.

It checks mechanical correctness and frame accuracy only — not gameplay, AI, or whether an ability is implemented correctly — so it complements human review rather than replacing it.

What it flags

Deterministic linter (cardlint.py):

Catches Example comment
Undefined Execute$/SubAbility$/Choices$/Chapter SVar `Execute$ TrigNope` → undefined SVar (card won't load)
Duplicate param duplicate `ValidTgts$` → engine keeps only the last
Param-key typo (one edit from a real key) `ValidTgt$` → `ValidTgts$`
Param found in no other card (made-up / outdated) `Foozle$` → appears in no other card (unknown or outdated param?)
Illegal mana-cost token illegal mana token `R2`
Non-Land card with no ManaCost line missing a `ManaCost` line
Loyalty ability missing Planeswalker$ True loyalty ability needs `Planeswalker$ True`
Trigger-only token on a directly-activated A: ability `TriggeredSourceController` is a trigger-only token on an `A:` line
Comma list where the engine splits on & `AddKeywords$` is comma-separated, but the engine splits on ` & `
Property filter with multiple . (matches nothing) `Spirit.You.Ctrl` has multiple `.` → matches nothing
Miscased Defined$ / zone token `Defined$ SELF` → `Self` (case-sensitive)
Card's own name written literally in a description literal `Krenko` in the description → use NICKNAME / CARDNAME
SpellDescription restating the activation cost SpellDescription restates the activation cost
SVar that's never referenced SVar `TrigDraw` is never referenced (clause never fires)
Malformed line prefix / key spacing / curly apostrophe miscased prefix `Svar:` (want `SVar:`)

Scryfall frame fact-check (card_script_review.py), each compared against the printed card:

Catches Example comment
Wrong or missing type / supertype Types `Artifact Vehicle` → `Legendary Artifact Vehicle`
Wrong power/toughness PT `4/2` → `3/2`
Wrong mana cost ManaCost `3 U U` → `2 U U`
Wrong loyalty Loyalty `4` → `5`
Name transcription typo Name `Slithering` → `Slithering Cryptid`

How it works

  • Runs on pull_request_target so it can comment on fork PRs; comments only on lines the PR added or changed; de-dupes across re-pushes, so later commits only get comments on the newly-touched cards.

How it was developed

The checks come from what maintainers actually flag, not guesswork: ~530 maintainer review comments on card-script PRs since 1 January 2026 (across ~120 PRs) were grouped by issue, and a deterministic check was added for each recurring mechanical mistake. The post-merge fix-up commits — errors that slipped past review, like a missing Legendary or a wrong cost — motivated the Scryfall frame check. Judgment-heavy issues were left to human review.

Why it's safe

  • Advisory — it never blocks merge, so a stray comment costs a glance, not a broken build.
  • Never executes PR code — runs only the base-branch scripts and reads the PR's cardsfolder/*.txt files as data; the changed-file filter excludes .github/**, so a PR can't substitute its own linter. Token scope is pull-requests: write and contents: read.
  • Low false-positive by design — most checks flag objective malformations, not judgment calls; the two judgment-prone ones (UNKNOWN-KEY and the frame check) stay conservative and skip what they can't be sure of.

Reviews changed card scripts on a pull request and posts terse inline comments:
a deterministic linter (undefined SVar references, duplicate/unknown params,
illegal mana tokens, missing ManaCost, loyalty abilities missing Planeswalker)
plus a Scryfall frame fact-check (name, type line, P/T, mana cost, loyalty).
Advisory only -- it never fails the check, it only comments.

Runs on pull_request_target so it can comment on fork PRs; it only reads the
PR's cardsfolder/*.txt files as data and runs the base-branch scripts, never
executing PR code. The linter derives its known params from the card corpus, so
it adapts when the engine adds or renames a param.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@tool4ever

Copy link
Copy Markdown
Contributor

Nice, that'll surely be a big timesaver

I wonder if there's some way to reuse related fields from CardScriptParser though or if sharing between the languages would get too messy 🤔

@MostCromulent

Copy link
Copy Markdown
Contributor Author

I wonder if there's some way to reuse related fields from CardScriptParser though or if sharing between the languages would get too messy 🤔

Claude analysis:

Most of the current checks are structural or lexical and need no engine vocabulary.

The exception is the unknown/typo'd-param checks (UNKNOWN-KEY, KEY-TYPO), and there's nothing to reuse for them: params are parsed into a flat map and read ad-hoc via getParam("X") with unknown keys silently ignored, so there's no authoritative key list. (CardScriptParser flags unrecognised keys, but only against a per-context whitelist far narrower than the params in use.) So those would have to stay corpus-based.

Where reusing CardScriptParser could be useful is adding a couple of new checks rather than changing existing ones:

  • Unknown API / trigger / replacement type — validating the ability (AB$/SP$/ST$/DB$), trigger Mode$, and replacement Event$ values against ApiType/TriggerType/ReplacementType; a typo'd API (e.g. DB$ DealDamge) would be caught, which the linter currently doesn't.
  • Defined$ / Valid$ token validity — against CardScriptParser's DEFINED_*/VALID_* sets; covers the Defined$ half of the current miscased-token check (zone casing would still need ZoneType).

If this was adopted then to share without a JVM at lint time a build step would need to export those sets/enums to a small JSON the linter reads — Java stays the single source of truth, nothing hand-duplicated to drift.

@tool4ever

Copy link
Copy Markdown
Contributor

Right, the bonus of being fast enough to just check based on all available scripts definitely means it's preferable for this use case 👍

I wonder if you could just combine it with test-build.yaml:
Then it might be easiest to simply write a test that can run on changed_files.txt and call some more of the standalone stuff like API legality? 🤔
Especially reusing definitions like additionalAbilityKeys would be better

@tool4ever tool4ever left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node.js 20 actions are deprecated. The following actions are running on Node.js 20 and may not work as expected: actions/checkout@v4, actions/github-script@v7, actions/setup-python@v5. Actions will be forced to run with Node.js 24 by default starting June 16th, 2026. Node.js 20 will be removed from the runner on September 16th, 2026. Please check if updated versions of these actions are available that support Node.js 24.

MostCromulent and others added 2 commits June 13, 2026 21:42
checkout v4->v5, setup-python v5->v6, github-script v7->v8 — all
runtime-only bumps off the deprecated Node 20.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The card-script review now runs its analysis as part of the normal build
and hands the findings to a separate workflow that posts them, replacing
the single pull_request_target workflow. The reason for the split: a
pull_request build has a read-only token on fork PRs (almost all card
PRs), so it can produce findings but cannot comment. A workflow_run
commenter runs afterwards in the base repo with a write token and does
the posting.

The flow:
  - test-build runs a new pure-advisory test (CardScriptApiTest) and
    uploads its findings plus the PR number as an artifact (if: always,
    so a red build still ships findings);
  - the commenter downloads the artifact, runs the deterministic linter
    and the Scryfall frame check, and posts the merged, deduped comments
    on the changed lines only.

The new test reuses the engine's own definitions -- the same checks it
runs at card load -- rather than re-encoding them:
  - ability/trigger/replacement API names via ApiType, TriggerType and
    ReplacementType, catching a typo'd API such as DB$ DealDamge that the
    corpus-based linter cannot;
  - sub-ability params in AbilityFactory.additionalAbilityKeys must point
    at a defined SVar, closing a gap where the linter knew only a handful
    of these keys. cardlint's REF_KEYS drops the now-engine-checked keys.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@MostCromulent MostCromulent requested a review from tool4ever June 13, 2026 22:14
…ct v7)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@MostCromulent

Copy link
Copy Markdown
Contributor Author

Proof of concept for reworked workflow:

  • Example PR - build fails here because of an intentionally defective lightning bolt which breaks a different AI test; this is just a testing artifact
  • Build workflow - the new test adds around ~1 second.
  • Review workflow - completes within ~30 seconds after build; node.js deprecation has now been addressed,

@MostCromulent

Copy link
Copy Markdown
Contributor Author

Separately, someone will need to check all the other github workflows for the node.js deprecation as well before Node.js 20 is removed in September - particularly the release workflows.

Comment thread .github/scripts/cardlint.py Outdated
- Linter: drop reference/list keys that don't exist in the engine
  (ChosenSubAbility, AbilityX, AddStatic, AddReplacement) and use the
  engine's real key names. Split Triggers/AddTriggers/AddStaticAbilities
  on ',' — the engine's actual delimiter — instead of mis-checking them as
  ' & ' lists.
- Guard LEX-DELIM against parameterized keywords whose values legitimately
  carry internal commas (Protection:..., OnlyUntapChosen:...), removing
  pre-existing false positives. Zero LEX-DELIM findings across the corpus.
- Correct LINE_PREFIXES to match the DSL: drop SetColor, add
  HandLifeModifier, Draft, CopyFaceFrom, MeldPair, SPECIALIZE.
- Resolve the PR from the artifact's number but re-validate it against the
  build's trusted head SHA before commenting, so a tampered artifact cannot
  aim the poster at another PR.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@tehdiplomat

Copy link
Copy Markdown
Contributor

I'm definitely in favor of this idea, but i think i'd approach it differently. Each Effect I'd define two arrays, one for required parameters and one for optional parameters. Those should theoretically be able to be pulled out of each effect based off how they are used within each class. We could either point the linter at those arrays, or inverted if something tries to use something not in the arrays lint and force users to put them in an array thats applicable. I've tried this a few times, but it was before AI was quite up to snuff to automate most of this task so didn't really finish.

@dracontes

Copy link
Copy Markdown
Contributor

I can only agree that this would be a nice thing to have 👍 For one, I'll be sure to eventually go over the code to glean a checklist for existing card scripts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants