Skip to content

dhruuvsharma/gpu-mt5-bt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gpu-mt5-bt

Distributed GPU-accelerated MetaTrader 5 strategy backtester and parameter optimizer.

MetaTrader 5's built-in strategy tester runs one parameter combination per CPU thread. gpu-mt5-bt maps each combination to its own CUDA thread, then shards the entire parameter grid across a Ray cluster of GPU workers — so optimization sweeps that take MT5 hours finish in seconds. Results are streamed to Parquet as shards complete; crashes can be resumed.

Strategies are written in Python, not MQL5. There is no transpilation — you author each strategy as a @cuda.jit kernel using the building blocks in gpu_mt5_bt.kernels. See Authoring a strategy below.

Architecture

                         ┌──────────────────┐
                         │   CLI (Typer)    │
                         │   gpu-mt5-bt …   │
                         └────────┬─────────┘
                                  │
                         ┌────────▼─────────┐
                         │  Coordinator     │   loads data + grid,
                         │  (Ray driver)    │   shards work, aggregates results
                         └────────┬─────────┘
                                  │ Ray actors
              ┌───────────────────┼───────────────────┐
       ┌──────▼──────┐    ┌──────▼──────┐    ┌──────▼──────┐
       │ GPU Worker  │    │ GPU Worker  │    │ GPU Worker  │ … N
       │ (1 per GPU) │    │ (1 per GPU) │    │ (1 per GPU) │
       └──────┬──────┘    └──────┬──────┘    └──────┬──────┘
       ┌──────▼──────┐    ┌──────▼──────┐    ┌──────▼──────┐
       │ CUDA kernel │    │ CUDA kernel │    │ CUDA kernel │
       │ N parallel  │    │ N parallel  │    │ N parallel  │
       │ backtests   │    │ backtests   │    │ backtests   │
       └─────────────┘    └─────────────┘    └─────────────┘
  • The bar array lives once in each GPU's memory; every thread on that GPU reads the same shared bars.
  • Each GPU thread owns one parameter combination and runs a complete sequential backtest. Strategies are not internally parallelized — that would corrupt sequential trade state.
  • The coordinator only sees aggregate metrics, not per-bar data, so cross-node traffic stays small.
  • Results stream to results.parquet; a crashed run resumes from the last completed shard.

Install

pip install -e ".[dev]"           # core + test deps
pip install -e ".[dev,mt5,nvml]"  # add MT5 live loader (Win-only) + GPU monitor

Numba CUDA needs an NVIDIA GPU and a recent CUDA toolkit (12.x is fine). On a machine without a GPU, kernels are skipped and the CPU reference path is used — so the test suite still passes; only the speedup is missing.

Quickstart

# Generate the included synthetic EURUSD H1 sample (one-shot).
python examples/generate_sample_data.py

# Run a 5,000-combo MA-crossover sweep on a local Ray cluster.
gpu-mt5-bt run examples/ma_crossover.yaml

# Render the HTML report from the latest run.
gpu-mt5-bt report runs/<latest>

# What strategies are registered?
gpu-mt5-bt strategies

run resolves the run directory to runs/<UTC-timestamp>_<config-name>/ and writes:

runs/20260508T101530Z_ma_crossover/
├── config.yaml          # frozen copy of the input config
├── metadata.json        # symbol, timeframe, n_bars, n_combos, started_at
├── results.parquet      # one row per parameter combo
├── trades_top.parquet   # detailed trades for top-N combos per shard
├── _shards_done.txt     # checkpoint for `--resume`
├── report.html          # generated by `gpu-mt5-bt report`
└── logs/run.log

Common flags

Flag Meaning
--resume Pick up the most recent matching run dir and skip completed shards.
--dry-run Print resolved config + grid size and exit (no execution).
--local Force in-process execution (no Ray). Useful for debugging.
--device gpu / cpu Force the device per worker. Default: auto.

Multi-machine cluster setup

# Head node
gpu-mt5-bt cluster start --head --num-gpus 2 --num-cpus 8

# Each worker node
gpu-mt5-bt cluster start --address 10.0.0.1:6379 --num-gpus 4

# Driver (anywhere reachable)
# Set ray_address: 'ray://10.0.0.1:10001' in your config and run normally:
gpu-mt5-bt run my_sweep.yaml

distributed.num_gpus_per_worker in the config controls how many GPUs each Ray actor reserves. With one actor per GPU, each actor caches the bar array and compiled kernel between shards so only the first shard pays JIT cost.

Configuration

Minimal example (examples/ma_crossover.yaml):

strategy: ma_cross

data:
  source: csv
  path: examples/data/EURUSD_H1.csv
  symbol: EURUSD
  timeframe: H1
  start: 2018-01-01
  end:   2024-12-31

execution:
  starting_balance: 10000
  leverage: 100
  commission_per_lot: 7.0
  slippage_points: 1
  spread_mode: from_bars         # or fixed: 1.5
  stop_out_pct: 0.5
  triple_swap_wednesday: true

position_sizing:
  mode: fixed_lot                # or percent_risk / martingale
  lot: 0.1

optimization:
  fast_period: { min: 5,  max: 50,  step: 1 }
  slow_period: { min: 20, max: 200, step: 1 }
  trailing_stop_atr: { min: 1.0, max: 5.0, step: 0.5 }

distributed:
  ray_address: auto              # 'auto' | 'local' | 'ray://host:10001'
  chunk_size: 10000
  num_gpus_per_worker: 1

output:
  metric_to_optimize: sharpe     # final_equity | sharpe | profit_factor | calmar | sortino
  keep_top_n_trades: 50

Every field is validated by Pydantic; typos and missing required fields fail fast with a precise error.

Authoring a strategy

A strategy is a Python class registered into a global registry. Each strategy must provide:

  • a CPU-only reference implementation (run_cpu) used by tests and the CPU fallback path
  • a Numba CUDA kernel (build_kernel) with signature (bars, params, exec_cfg, out_metrics, out_trades, n_bars)

The two implementations must agree numerically — the test suite enforces this on a fixed-seed synthetic series.

The execution machinery exposes ready-made device functions you can call from your kernel:

Helper Purpose
apply_spread_device Add/subtract spread to a fill price
commission_device Per-lot commission
fx_pnl_device P&L in account currency
lot_for_combo_device Lot size given the configured sizing mode
swap_for_bar_device Per-bar swap accrual (with Wed triple-swap)
record_trade_device Write a trade row into the output buffer
sma_at_device, ema_at_device, rsi_at_device, atr_at_device, donchian_at_device Indicators evaluated at one bar

See src/gpu_mt5_bt/strategies/ma_cross.py for a complete worked example (Python _run_one_combo_cpu + @cuda.jit ma_cross_kernel). Once authored, register it from strategies/__init__.py and reference it by name in YAML.

The reference strategies that ship in-box:

Name Idea
ma_cross Fast/slow SMA crossover with optional ATR trailing stop
rsi_meanrev Buy on RSI cross-up through oversold, sell on cross-down through overbought
donchian_breakout N-bar channel breakout entry, opposite-channel exit, ATR hard stop

Validating against MT5

Export an MT5 strategy-tester report (right-click chart → Save As Report.htm) and run:

gpu-mt5-bt validate runs/<latest> path/to/StrategyTester.htm

The validator parses the summary fields and trade list out of the HTML, picks the GPU run with the highest final_equity from results.parquet, and prints a side-by-side diff. Acceptance: final equity within 0.1% (configurable with --tolerance).

Tests

pytest -v                                        # all tests
pytest -m "not gpu"                              # skip GPU-only tests
pytest tests/integration/test_distributed.py     # only Ray integration

GPU tests are auto-skipped when CUDA isn't available; same for Ray and MT5. Coverage target is ≥80% on non-kernel code, ≥60% overall.

License

MIT.

About

Distributed GPU-accelerated MetaTrader 5 strategy backtester and parameter optimizer. Maps each parameter combo to a CUDA thread, shards across a Ray cluster of GPU workers. Numba CUDA + Ray + Polars + Pydantic.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors