Skip to content

Document step() action-scaling contract; add _validate_action_shares hook#1416

Open
guitaristsam wants to merge 1 commit intoAI4Finance-Foundation:masterfrom
guitaristsam:clarify-step-action-scaling
Open

Document step() action-scaling contract; add _validate_action_shares hook#1416
guitaristsam wants to merge 1 commit intoAI4Finance-Foundation:masterfrom
guitaristsam:clarify-step-action-scaling

Conversation

@guitaristsam
Copy link
Copy Markdown

Problem

StockTradingEnv.step() rescales the input actions from [-1, 1] to integer share counts via actions = actions * self.hmax; actions.astype(int) before any _buy_stock/_sell_stock call sees it. This is correct behaviour, but subclasses overriding step() to add custom action constraints commonly:

  1. Validate / round / clip the raw [-1, 1] action before the parent's * self.hmax rescaling. Net effect: the action grid collapses to {-hmax, 0, +hmax} with no fine-grained levels in between, and the policy std parameter gets stuck during PPO training (no gradient benefit to outputting fractional values).

  2. Re-implement the entire step() flow to inject custom logic, which means the subclass has to re-track FinRL's internal state-layout conventions.

What this PR changes

Two non-breaking additions to env_stocktrading.py:

  • Docstring on step() explicitly stating the [-1, 1] → integer-shares rescaling contract and pointing subclasses at the new hook.

  • _validate_action_shares(action_shares) method, called after the rescaling and turbulence override but before the buy/sell loop. Default implementation is identity, so existing subclasses are unaffected. Subclasses that want post-rescaling constraints (per-step share-delta caps, integer-only enforcement, tighter budget pre-checks, no-shorting) can override this single method instead of re-implementing step().

Total: 43 lines added, 0 lines removed, no behaviour change for existing users.

Note on FinRL-X

I'm aware development is shifting to FinRL-X, which uses a weight-centric interface and has no Gym env. This PR targets legacy FinRL specifically because the action-scaling trap affects users following the existing educational/research tutorials, which explicitly remain on this repo.

Real-world reproducer

Hit this trap while subclassing StockTradingEnv to enforce integer-share trading on Indian equities. The smoking gun was the PPO std parameter stuck at 0.97 across 200k training timesteps. Full debugging diary, including 13 versions of single-variable iteration, at https://github.com/guitaristsam/RL_PPO_NIFTY50 — see the "Action scaling" section of CLAUDE.md.

…es` hook

`StockTradingEnv.step()` rescales the input `actions` from [-1, 1] to
integer share counts via `actions = actions * self.hmax; actions.astype(int)`
before any `_buy_stock` / `_sell_stock` call sees it. This is correct, but
subclasses overriding `step()` to add custom action constraints commonly
make one of two mistakes:

1. Validate / clip / round the raw [-1, 1] action BEFORE the parent's
   `* self.hmax` rescaling. Net effect: the action grid collapses to
   {-hmax, 0, +hmax} with no fine-grained levels in between, and the
   policy `std` gets stuck during PPO training because there's no
   gradient benefit to outputting fractional values.

2. Re-implement the entire `step()` flow to inject custom logic, which
   means the subclass has to track FinRL's internal state-layout
   conventions (`state[1:1+stock_dim]` is prices, not shares — a separate
   trap that this PR does not address but is mentioned as related).

This PR adds two non-breaking changes:

- A docstring on `step()` explicitly stating the [-1, 1] -> integer-shares
  rescaling contract and pointing subclassers at the new hook.

- A `_validate_action_shares(action_shares)` method that runs after the
  scaling and turbulence override but before the buy/sell loop. Default
  implementation is identity, so existing subclasses are unaffected.
  Subclasses that want post-rescaling constraints (per-step share-delta
  caps, integer-only enforcement, tighter budget pre-checks, no-shorting)
  can now override this single method instead of re-implementing `step()`.

Note on FinRL-X: I'm aware the project is moving toward FinRL-X, which
uses a weight-centric interface and has no Gym env at all. This PR
targets legacy FinRL specifically because the action-scaling trap is an
issue for users following the existing educational/research tutorials,
which explicitly remain on this repo.

Real-world reproducer: I hit this trap while subclassing StockTradingEnv
to enforce integer-share trading on Indian equities. v6 of my project
silently collapsed the policy's action grid; the smoking gun was the
PPO `std` parameter stuck at 0.97 across all 200k training timesteps.
Tracking it down required reading line 314 of env_stocktrading.py and
the offending subclass side-by-side. Full debugging diary at
https://github.com/guitaristsam/RL_PPO_NIFTY50 (see the "Action scaling"
section of CLAUDE.md).
@guitaristsam
Copy link
Copy Markdown
Author

The failing pre-commit.ci check is not caused by this PR. Diagnostic:

  • reorder-python-imports and black are reformatting 11 unrelated files (yahoodownloader.py, processor_yahoofinance.py, models.py, etc.) — none of which are touched by this PR. These appear to be drift since the last upstream pre-commit autofix.
  • pyupgrade crashes with TypeError: cannot use a bytes pattern on a string-like object — this is an upstream pyupgrade incompatibility with Python 3.14 that pre-commit.ci is now using; completely independent of this change.
  • flake8 is correctly skipped per ci: skip: [flake8].

I deliberately did not include the reformat of unrelated files in this PR to keep it focused on the action-scaling documentation and _validate_action_shares hook.

The patch itself (43 lines added, 0 removed) passes black, reorder-python-imports, and pyupgrade locally on the file alone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant