📊 Backtesting & Validation Framework

Rigorous statistical validation ensuring strategy robustness before live deployment

Walk-Forward Analysis

Out-of-sample validation methodology

Every strategy undergoes walk-forward analysis with a 12-month in-sample optimization window and 3-month out-of-sample testing window. The window advances monthly, producing a continuous series of out-of-sample results that simulate real-time performance.

This methodology prevents overfitting by ensuring that every performance metric is computed on data the model has never seen during parameter optimization. Only strategies that demonstrate consistent out-of-sample performance are promoted to paper trading.

Monte Carlo Simulation

10,000 permutation stress testing

Trade sequences are randomly permuted 10,000 times to generate a distribution of possible equity curves. This reveals the range of outcomes attributable to luck vs. skill and provides confidence intervals for key metrics.

95th Percentile Max DD

Worst drawdown in 95% of simulations

Ruin Probability

Percentage of paths hitting 50% drawdown

Profit Factor Range

5th–95th percentile of gross profit/loss

Recovery Time

Median time to recover from max drawdown

Realistic Cost Model

Friction-adjusted performance

Slippage

0.02% per side for liquid ETFs, 0.05% for individual equities

Commissions

$0.005/share (Alpaca), 0.1% taker fee (crypto)

Spread Impact

Half bid-ask spread applied to each trade at time of signal

Market Impact

Square-root model for position sizes > 1% of ADV

Borrowing Costs

Short positions include borrow rate (Fed Funds + 0.5%)

Regime-Conditional Analysis

Performance by market state

All backtests are segmented by HMM regime state (Bull, Transition, Bear). This reveals which strategies perform in which environments and validates the regime-adaptive allocation logic. A strategy is only approved if it demonstrates positive expectancy in its target regime(s) and does not hemorrhage capital in adverse regimes.

Statistical Significance

Hypothesis testing framework

t-test on Returns

p < 0.05

Mean return statistically different from zero

Bootstrap Sharpe

95% CI > 0.5

Confidence interval of Sharpe ratio excludes low values

Deflated Sharpe Ratio

DSR > 0.95

Accounts for multiple testing, data snooping, and non-normality

Minimum Track Record

200+ trades

Sufficient sample size for statistical reliability