Skip to content

Look-ahead bias in swing_highs_lows() — backtest results are inflated #101

@joeytran369

Description

@joeytran369

swing_highs_lows() uses a centered window that looks into future bars:

# smc.py line 153-163
ohlc["high"].shift(-(swing_length // 2)).rolling(swing_length).max()

shift(-(swing_length // 2)) shifts data forward, so the function sees future price data that wouldn't exist in real-time trading. This is look-ahead bias and it inflates backtest results significantly.

I tested the same trading logic with and without this bias on XAUUSD M15 (10 years, 280k bars):

Swing Method Trades Win Rate Profit Factor
Centered window (current) 177 81.4% 7.32
No look-ahead (confirm bars) 106 52.8% 1.82

PF drops from 7.32 to 1.82 when the bias is removed. Anyone backtesting with this function will get inflated results.

A simple fix using confirm bars (only past data, no future):

for i in range(lookback + confirm_bars, n):
    candidate = i - confirm_bars
    window = highs[candidate - lookback:candidate + 1]
    if highs[candidate] == window.max():
        if all(highs[candidate + cb] < highs[candidate] for cb in range(1, confirm_bars + 1)):
            swing_high[i] = highs[candidate]

The candidate bar is the highest in the past lookback bars, then confirmed by confirm_bars subsequent bars all being lower. No future data used.

I saw PR #95 adds a causal parameter that shifts outputs forward. That prevents using future data for signals but the detection itself still uses the centered window. The confirm bars approach only uses past data from the start.

Would be nice to have a causal=True option that uses a genuinely bias-free algorithm. Happy to submit a PR if there's interest.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions