gpu-mt5-bt

Distributed GPU-accelerated MetaTrader 5 strategy backtester and parameter optimizer.

MetaTrader 5's built-in strategy tester runs one parameter combination per CPU thread. gpu-mt5-bt maps each combination to its own CUDA thread, then shards the entire parameter grid across a Ray cluster of GPU workers — so optimization sweeps that take MT5 hours finish in seconds. Results are streamed to Parquet as shards complete; crashes can be resumed.

Strategies are written in Python, not MQL5. There is no transpilation — you author each strategy as a @cuda.jit kernel using the building blocks in gpu_mt5_bt.kernels. See Authoring a strategy below.

Architecture

                         ┌──────────────────┐
                         │   CLI (Typer)    │
                         │   gpu-mt5-bt …   │
                         └────────┬─────────┘
                                  │
                         ┌────────▼─────────┐
                         │  Coordinator     │   loads data + grid,
                         │  (Ray driver)    │   shards work, aggregates results
                         └────────┬─────────┘
                                  │ Ray actors
              ┌───────────────────┼───────────────────┐
       ┌──────▼──────┐    ┌──────▼──────┐    ┌──────▼──────┐
       │ GPU Worker  │    │ GPU Worker  │    │ GPU Worker  │ … N
       │ (1 per GPU) │    │ (1 per GPU) │    │ (1 per GPU) │
       └──────┬──────┘    └──────┬──────┘    └──────┬──────┘
       ┌──────▼──────┐    ┌──────▼──────┐    ┌──────▼──────┐
       │ CUDA kernel │    │ CUDA kernel │    │ CUDA kernel │
       │ N parallel  │    │ N parallel  │    │ N parallel  │
       │ backtests   │    │ backtests   │    │ backtests   │
       └─────────────┘    └─────────────┘    └─────────────┘

The bar array lives once in each GPU's memory; every thread on that GPU reads the same shared bars.
Each GPU thread owns one parameter combination and runs a complete sequential backtest. Strategies are not internally parallelized — that would corrupt sequential trade state.
The coordinator only sees aggregate metrics, not per-bar data, so cross-node traffic stays small.
Results stream to results.parquet; a crashed run resumes from the last completed shard.

Install

pip install -e ".[dev]"           # core + test deps
pip install -e ".[dev,mt5,nvml]"  # add MT5 live loader (Win-only) + GPU monitor

Numba CUDA needs an NVIDIA GPU and a recent CUDA toolkit (12.x is fine). On a machine without a GPU, kernels are skipped and the CPU reference path is used — so the test suite still passes; only the speedup is missing.

Quickstart

# Generate the included synthetic EURUSD H1 sample (one-shot).
python examples/generate_sample_data.py

# Run a 5,000-combo MA-crossover sweep on a local Ray cluster.
gpu-mt5-bt run examples/ma_crossover.yaml

# Render the HTML report from the latest run.
gpu-mt5-bt report runs/<latest>

# What strategies are registered?
gpu-mt5-bt strategies

run resolves the run directory to runs/<UTC-timestamp>_<config-name>/ and writes:

runs/20260508T101530Z_ma_crossover/
├── config.yaml          # frozen copy of the input config
├── metadata.json        # symbol, timeframe, n_bars, n_combos, started_at
├── results.parquet      # one row per parameter combo
├── trades_top.parquet   # detailed trades for top-N combos per shard
├── _shards_done.txt     # checkpoint for `--resume`
├── report.html          # generated by `gpu-mt5-bt report`
└── logs/run.log

Common flags

Flag	Meaning
`--resume`	Pick up the most recent matching run dir and skip completed shards.
`--dry-run`	Print resolved config + grid size and exit (no execution).
`--local`	Force in-process execution (no Ray). Useful for debugging.
`--device gpu` / `cpu`	Force the device per worker. Default: auto.

Multi-machine cluster setup

# Head node
gpu-mt5-bt cluster start --head --num-gpus 2 --num-cpus 8

# Each worker node
gpu-mt5-bt cluster start --address 10.0.0.1:6379 --num-gpus 4

# Driver (anywhere reachable)
# Set ray_address: 'ray://10.0.0.1:10001' in your config and run normally:
gpu-mt5-bt run my_sweep.yaml

distributed.num_gpus_per_worker in the config controls how many GPUs each Ray actor reserves. With one actor per GPU, each actor caches the bar array and compiled kernel between shards so only the first shard pays JIT cost.

Configuration

Minimal example (examples/ma_crossover.yaml):

strategy: ma_cross

data:
  source: csv
  path: examples/data/EURUSD_H1.csv
  symbol: EURUSD
  timeframe: H1
  start: 2018-01-01
  end:   2024-12-31

execution:
  starting_balance: 10000
  leverage: 100
  commission_per_lot: 7.0
  slippage_points: 1
  spread_mode: from_bars         # or fixed: 1.5
  stop_out_pct: 0.5
  triple_swap_wednesday: true

position_sizing:
  mode: fixed_lot                # or percent_risk / martingale
  lot: 0.1

optimization:
  fast_period: { min: 5,  max: 50,  step: 1 }
  slow_period: { min: 20, max: 200, step: 1 }
  trailing_stop_atr: { min: 1.0, max: 5.0, step: 0.5 }

distributed:
  ray_address: auto              # 'auto' | 'local' | 'ray://host:10001'
  chunk_size: 10000
  num_gpus_per_worker: 1

output:
  metric_to_optimize: sharpe     # final_equity | sharpe | profit_factor | calmar | sortino
  keep_top_n_trades: 50

Every field is validated by Pydantic; typos and missing required fields fail fast with a precise error.

Authoring a strategy

A strategy is a Python class registered into a global registry. Each strategy must provide:

a CPU-only reference implementation (run_cpu) used by tests and the CPU fallback path
a Numba CUDA kernel (build_kernel) with signature (bars, params, exec_cfg, out_metrics, out_trades, n_bars)

The two implementations must agree numerically — the test suite enforces this on a fixed-seed synthetic series.

The execution machinery exposes ready-made device functions you can call from your kernel:

Helper	Purpose
`apply_spread_device`	Add/subtract spread to a fill price
`commission_device`	Per-lot commission
`fx_pnl_device`	P&L in account currency
`lot_for_combo_device`	Lot size given the configured sizing mode
`swap_for_bar_device`	Per-bar swap accrual (with Wed triple-swap)
`record_trade_device`	Write a trade row into the output buffer
`sma_at_device`, `ema_at_device`, `rsi_at_device`, `atr_at_device`, `donchian_at_device`	Indicators evaluated at one bar

See src/gpu_mt5_bt/strategies/ma_cross.py for a complete worked example (Python _run_one_combo_cpu + @cuda.jit ma_cross_kernel). Once authored, register it from strategies/__init__.py and reference it by name in YAML.

The reference strategies that ship in-box:

Name	Idea
`ma_cross`	Fast/slow SMA crossover with optional ATR trailing stop
`rsi_meanrev`	Buy on RSI cross-up through `oversold`, sell on cross-down through `overbought`
`donchian_breakout`	N-bar channel breakout entry, opposite-channel exit, ATR hard stop

Validating against MT5

Export an MT5 strategy-tester report (right-click chart → Save As Report → .htm) and run:

gpu-mt5-bt validate runs/<latest> path/to/StrategyTester.htm

The validator parses the summary fields and trade list out of the HTML, picks the GPU run with the highest final_equity from results.parquet, and prints a side-by-side diff. Acceptance: final equity within 0.1% (configurable with --tolerance).

Tests

pytest -v                                        # all tests
pytest -m "not gpu"                              # skip GPU-only tests
pytest tests/integration/test_distributed.py     # only Ray integration

GPU tests are auto-skipped when CUDA isn't available; same for Ray and MT5. Coverage target is ≥80% on non-kernel code, ≥60% overall.

License

MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
src/gpu_mt5_bt		src/gpu_mt5_bt
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gpu-mt5-bt

Architecture

Install

Quickstart

Common flags

Multi-machine cluster setup

Configuration

Authoring a strategy

Validating against MT5

Tests

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

gpu-mt5-bt

Architecture

Install

Quickstart

Common flags

Multi-machine cluster setup

Configuration

Authoring a strategy

Validating against MT5

Tests

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages