Agent Architecture

A three-layer system where deterministic checks do the heavy lifting and LLMs are called only when they add value.

Operational Phases

PHASE A

Research & Strategy Building

Hypothesis-driven exploration with budget constraints and quality gates on every action. Haiku plans cheaply, Sonnet makes high-stakes decisions. A state machine governs the full research lifecycle.

state machine · budget-constrained · quality gates

PHASE B

Live Deployment Validation

Intensive monitoring during the first hour after go-live. All deterministic checks run simultaneously with LLM analysts on standby. Automatic rollback on critical findings.

5-min intervals · 6 checks active · auto-rollback

PHASE CActive

Steady-State Monitoring

Deterministic-first monitoring at 30-minute intervals — free. LLM analysts invoked only on degradation or anomalies. 24/7 autonomous operation with crash recovery.

93% cost reduction · 24/7 autonomous · crash recovery

Deterministic vs LLM

The key insight: most monitoring cycles find nothing wrong. Running an LLM to confirm "everything is fine" is wasteful. Deterministic checks are instant, free, and catch 95%+ of actionable issues.

Deterministic Checks

Always free, instant

bug_check

Code changes, runtime errors, stack traces

health_check

API connectivity, memory, process heartbeats

signal_check

Signal generation consistency, threshold drift

trade_check

Fill quality, slippage, position reconciliation

backtest_check

Live vs backtest divergence, outcome alignment

drift_check

Performance degradation, regime shift detection

Flags issues → escalates

LLM Analysts

On-demand, only when flagged

trade_analyst

Deep analysis of win/loss patterns and exit timing

investigation_analyst

Root cause analysis when checks flag anomalies

strategy_analyst

Strategy health assessment and adaptation recommendations

Reports to → routes actions

Router

Rule-based state machine, no LLM

Orchestrates the full monitoring cycle. Dispatches deterministic checks first, evaluates results, and only escalates to LLM analysts when anomalies are detected. No AI cost during normal operation.

Research Agent

An autonomous agent that tests trading hypotheses through iterative experimentation. Budget-constrained, goal-directed, with quality gates on every action. Two modes: hypothesis (test a specific idea) and explorer (discover new opportunities).

research-agent

0/33

$ helios research --hypothesis "mean-reversion with funding rate filter" --budget 10

[PLAN] Evaluating hypothesis: mean-reversion with synthetic funding rate on BTC-USD

Budget: 10 rounds | Model: Haiku plans, Sonnet decides

BACKTESTEstablishing baseline with default parameters

→ 100 trades | Sharpe 1.28 | Win rate 41.0% | Max DD -26.4%

Assessment: Viable signal, drawdown needs improvement

BACKTESTTesting tighter risk controls with funding rate gate

→ 54 trades | Sharpe 1.75 | Win rate 51.9% | Max DD -15.3%

Assessment: Funding rate filter reduces noise, Sharpe trending up

OPTIMIZEBroad parameter sweep across promising region

→ 162 combinations | 5.6% pass rate | Best Sharpe 1.78

Assessment: Tight pass rate indicates selective filtering

BACKTESTValidating best config on full 24-month dataset

→ 54 trades | Sharpe 1.78 | Win rate 51.9% | Max DD -16.0% | ROI +156%

Assessment: All constraints passed, consistent across timeframes

VALIDATERunning walk-forward validation (IS/OOS split)

→ OOS degradation ratio 1.22x | 2/3 folds profitable | Confidence: HIGH

VALIDATEMonte Carlo bootstrap (1,000 resamples)

→ p50 Sharpe: 17.3 | p5 floor: 13.5 | Statistically robust

Assessment: Narrow confidence intervals, edge confirmed

CONCLUDEHypothesis confirmed — deploy to live monitoring

✓ VERDICT: Deploy to live monitoring (Phase B)

Budget used: 7/10 rounds | 8 LLM calls | 5 optimization sweeps

State: CONCLUDED | Next: Live deployment validation

State Machine

PLAN→BASELINE→EXPERIMENT→EVALUATE→CONCLUDE

↺EVALUATE can loop back to EXPERIMENT if budget allows

Strategy System

Strategies are assembled from pluggable components defined in YAML. New indicators and filters are added by subclassing base types — no core modifications needed. The explorer agent can generate, validate, and integrate new components autonomously.

strategy.yaml

strategy:
  indicators:
    primary:    { type: zscore, source: hl2 }
    momentum:   { type: momentum, source: primary }
    volume:     { type: zscore, source: volume }
    atr:        { type: atr, method: ewm }
  entry_conditions:
    - indicator: primary, operator: ">=", mirror: true
  exit_conditions:
    - indicator: primary, operator: crosses_below
  risk:
    stop_loss:  { type: atr }
    take_profit: { type: atr }
    sizing:     { type: risk_pct }
# Parameter values loaded at runtime — never committed to config

Extensible Components

New indicators (subclass BaseIndicator), new filters (subclass BaseFilter), new exit rules — all plug-and-play with zero core changes.

Autonomous Creation

Explorer agent can generate, validate, and integrate new strategy components without human intervention.

Validation Pipeline

1. Structural

Correct inheritance, required methods, type signatures

2. Integration

Component loads, connects to data feeds, no import errors

3. Functional

Generates valid signals, handles edge cases, no NaN outputs

4. Quality

Backtest meets minimum thresholds before integration

Analysis Dashboard

A private FastAPI + Plotly.js dashboard for deep analysis. Backtests are scanned from the filesystem, indexed, and made searchable. Every optimization run gets distribution charts, parameter sensitivity analysis, and constraint failure breakdowns.

helios-dashboard

$ helios dashboard --summary

Backtest Explorer    54,000+ runs indexed · sortable · filterable
Optimization View    66 sweeps · heatmaps · parameter sensitivity
Chart Suite          7 interactive types · equity · drawdown · scatter
Comparison Tool      Side-by-side metrics · parameter diffs · equity overlay

Walk-Forward Validation

Strategies are validated using time-series-aware methodology. No shuffling, no future data leakage — the same constraints a live trader faces.

In-Sample / Out-of-Sample

Chronological fold splitting preserves time-series integrity. Optimize on IS data, validate on OOS data per fold.

Monte Carlo Bootstrap

1,000 resamples generate confidence intervals for Sharpe, drawdown, and win rate. Flags: low_confidence, sequence_dependent.

IS/OOS Fold Structure

Fold 1

IS: Train

OOS

Fold 2

IS: Train

OOS

Fold 3

IS: Train

OOS

→ No shuffling — preserves temporal causality

Cost Optimization

The deterministic-first architecture dramatically reduces LLM costs. Most monitoring cycles complete without a single API call.

93%cost reduction in steady-state monitoring

Scenario	LLM-Only	Deterministic-First	Savings
Routine cycle (no issues)	$0.05–0.10	$0	100%
Deployment validation (1 hr)	$1.50–3.00	$0–0.30	~80%
Steady-state (24 hrs)	$12–30	$0–2.00	93%

Deterministic checks handle routine operations at zero cost. LLM analysts are invoked only when checks flag anomalies that require reasoning — typically <5% of monitoring cycles.

Code Architecture

Core

/core

OrderPositionInstrumentAccount

Strategy

/strategy

PluginStrategyBaseIndicatorBaseFilter

Execution

/execution

OrderExecutorPositionManagerRiskController

Research

/research

ResearchAgentStateMachineBudgetTracker

Monitoring

/monitoring

RouterCheckRunnerAnalystDispatcher

Validation

/validation

WalkForwardMonteCarloFoldSplitter

View performance data Background & contact