Document step() action-scaling contract; add _validate_action_shares hook by guitaristsam · Pull Request #1416 · AI4Finance-Foundation/FinRL

guitaristsam · 2026-04-30T19:54:23Z

Problem

StockTradingEnv.step() rescales the input actions from [-1, 1] to integer share counts via actions = actions * self.hmax; actions.astype(int) before any _buy_stock/_sell_stock call sees it. This is correct behaviour, but subclasses overriding step() to add custom action constraints commonly:

Validate / round / clip the raw [-1, 1] action before the parent's * self.hmax rescaling. Net effect: the action grid collapses to {-hmax, 0, +hmax} with no fine-grained levels in between, and the policy std parameter gets stuck during PPO training (no gradient benefit to outputting fractional values).
Re-implement the entire step() flow to inject custom logic, which means the subclass has to re-track FinRL's internal state-layout conventions.

What this PR changes

Two non-breaking additions to env_stocktrading.py:

Docstring on step() explicitly stating the [-1, 1] → integer-shares rescaling contract and pointing subclasses at the new hook.
_validate_action_shares(action_shares) method, called after the rescaling and turbulence override but before the buy/sell loop. Default implementation is identity, so existing subclasses are unaffected. Subclasses that want post-rescaling constraints (per-step share-delta caps, integer-only enforcement, tighter budget pre-checks, no-shorting) can override this single method instead of re-implementing step().

Total: 43 lines added, 0 lines removed, no behaviour change for existing users.

Note on FinRL-X

I'm aware development is shifting to FinRL-X, which uses a weight-centric interface and has no Gym env. This PR targets legacy FinRL specifically because the action-scaling trap affects users following the existing educational/research tutorials, which explicitly remain on this repo.

Real-world reproducer

Hit this trap while subclassing StockTradingEnv to enforce integer-share trading on Indian equities. The smoking gun was the PPO std parameter stuck at 0.97 across 200k training timesteps. Full debugging diary, including 13 versions of single-variable iteration, at https://github.com/guitaristsam/RL_PPO_NIFTY50 — see the "Action scaling" section of CLAUDE.md.

…es` hook `StockTradingEnv.step()` rescales the input `actions` from [-1, 1] to integer share counts via `actions = actions * self.hmax; actions.astype(int)` before any `_buy_stock` / `_sell_stock` call sees it. This is correct, but subclasses overriding `step()` to add custom action constraints commonly make one of two mistakes: 1. Validate / clip / round the raw [-1, 1] action BEFORE the parent's `* self.hmax` rescaling. Net effect: the action grid collapses to {-hmax, 0, +hmax} with no fine-grained levels in between, and the policy `std` gets stuck during PPO training because there's no gradient benefit to outputting fractional values. 2. Re-implement the entire `step()` flow to inject custom logic, which means the subclass has to track FinRL's internal state-layout conventions (`state[1:1+stock_dim]` is prices, not shares — a separate trap that this PR does not address but is mentioned as related). This PR adds two non-breaking changes: - A docstring on `step()` explicitly stating the [-1, 1] -> integer-shares rescaling contract and pointing subclassers at the new hook. - A `_validate_action_shares(action_shares)` method that runs after the scaling and turbulence override but before the buy/sell loop. Default implementation is identity, so existing subclasses are unaffected. Subclasses that want post-rescaling constraints (per-step share-delta caps, integer-only enforcement, tighter budget pre-checks, no-shorting) can now override this single method instead of re-implementing `step()`. Note on FinRL-X: I'm aware the project is moving toward FinRL-X, which uses a weight-centric interface and has no Gym env at all. This PR targets legacy FinRL specifically because the action-scaling trap is an issue for users following the existing educational/research tutorials, which explicitly remain on this repo. Real-world reproducer: I hit this trap while subclassing StockTradingEnv to enforce integer-share trading on Indian equities. v6 of my project silently collapsed the policy's action grid; the smoking gun was the PPO `std` parameter stuck at 0.97 across all 200k training timesteps. Tracking it down required reading line 314 of env_stocktrading.py and the offending subclass side-by-side. Full debugging diary at https://github.com/guitaristsam/RL_PPO_NIFTY50 (see the "Action scaling" section of CLAUDE.md).

guitaristsam · 2026-05-01T07:45:49Z

The failing pre-commit.ci check is not caused by this PR. Diagnostic:

reorder-python-imports and black are reformatting 11 unrelated files (yahoodownloader.py, processor_yahoofinance.py, models.py, etc.) — none of which are touched by this PR. These appear to be drift since the last upstream pre-commit autofix.
pyupgrade crashes with TypeError: cannot use a bytes pattern on a string-like object — this is an upstream pyupgrade incompatibility with Python 3.14 that pre-commit.ci is now using; completely independent of this change.
flake8 is correctly skipped per ci: skip: [flake8].

I deliberately did not include the reformat of unrelated files in this PR to keep it focused on the action-scaling documentation and _validate_action_shares hook.

The patch itself (43 lines added, 0 removed) passes black, reorder-python-imports, and pyupgrade locally on the file alone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document step() action-scaling contract; add _validate_action_shares hook#1416

Document step() action-scaling contract; add _validate_action_shares hook#1416
guitaristsam wants to merge 1 commit intoAI4Finance-Foundation:masterfrom
guitaristsam:clarify-step-action-scaling

guitaristsam commented Apr 30, 2026

Uh oh!

guitaristsam commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

guitaristsam commented Apr 30, 2026

Problem

What this PR changes

Note on FinRL-X

Real-world reproducer

Uh oh!

guitaristsam commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant