May 22, 2026VOL. 1
DISPATCHED ── FRI 22 MAY 2026 · 09:15 UTC
VOL. 1 — NO. 22
Editorial Standards · No. III

How The Future Express
Measures Accuracy

We publish twenty-four articles a day. Each predicts a market outcome at a stated probability. Markets eventually resolve. We compare. Here is the math.

Updated 2026-05-22 · Resolved predictions: 0 · Active markets tracked: 493
§1 · The Score
Aggregate Brier Score
—.———
Score available after the first market resolves. Until then, we publish, log, and wait.
Perfect · 0.000Future ExpressRandom · 0.250

Brier score is the squared error between published probability and observed outcome. Zero is perfect. 0.25 is the score a coin would get. Tetlock's superforecasters land between 0.10 and 0.15 on geopolitical questions.

The math
brier_per_market = (forecast_probability − outcome)²
mean_brier       = sum(brier_per_market) / count(resolved)

forecast_probability = articles.probabilityAtPublish / 100
outcome              = markets.resolutionOutcome === "yes" ? 1 : 0

§2 · Calibration

Are 70% predictions resolving 70% of the time?

For each probability bucket, we plot the share of markets that actually resolved YES. A perfectly calibrated forecaster sits on the 45° line. Buckets with fewer than 5 resolved predictions are rendered as hatched bars and excluded from inference.

0%20%40%60%80%100%0%20%40%60%80%100%PREDICTED PROBABILITY (PUBLISHED)EMPIRICAL FREQUENCYPerfect calibration (45°)Bucket mean (n ≥ 5)Insufficient data (n < 5)
BucketnMean forecastEmpiricalStatus
010%0No data
1020%0No data
2030%0No data
3040%0No data
4050%0No data
5060%0No data
6070%0No data
7080%0No data
8090%0No data
90100%0No data

§3 · Track record by desk

Where we're sharp, where we're soft.

Brier is a strictly proper scoring rule, so smaller is always better. Sports markets generally resolve cleaner than politics; entertainment is dominated by award shows; crypto is volatile but oracle-clean.

DesknBrierBest callWorst call
Politics0
Economy0
Crypto0
Sports0
Science0
Entertainment0
World0

No resolved predictions yet. The table will populate as Polymarket and Kalshi markets settle.


§4 · How we generate articles

From order book to broadsheet.

Sources. Every probability we publish is a live read from one of two prediction-market venues:Polymarket(USDC, Polygon-settled) andKalshi(CFTC-regulated, USD). When both venues quote the same question, we report the volume-weighted blend and disclose the spread. Order-book data is pulled hourly via their public Gamma and v2 APIs.

Model stack. Article prose is drafted by a multi-provider fallback chain configured by LLM_PROVIDER_PRIORITY. In production order: Anthropic Claude Sonnet 4.6, OpenRouter (arcee-ai/trinity-mini), OpenAI gpt-4o-mini, and the 0G Compute Network (llama-3.3) as a decentralized fallback. If a provider 5xx's, the next one picks up the same prompt — no silent degradation, no cached fakes. Image illustrations use sourceful/riverflow-v2-fast with a 1920s halftone style transfer. Every article carries the model name in its FILED line.

Web research. Before a draft is written, the editor pulls 5–10 corroborating sources via Tavily (semantic search over recent news) and Brave Search (general web fallback). Snippets are passed into the prompt as context; URLs are stored on the article record and rendered in the "Sources" rail. We do not clip paywalled bodies, only headline + abstract + URL.

The contrarian take. Every article ships with a Contrarian Take field — a separate generation that argues the case against the market's implied direction. This is not editorial cosplay; it's a hedge against the well-documented bias of LLMs to launder consensus into prose. If the market says 78% YES, the contrarian paragraph argues why the 22% NO is the sharper bet. Finance pros tell us this is the field they read first.

What we don't cover. We exclude markets that incentivize harm: assassination markets, markets on individuals' deaths, doxxing-adjacent markets, and any market whose resolution criterion would reward violence. We also skip markets with an obvious oracle-capture problem (e.g. "will I tweet X by Friday" from a market creator's own account). The exclusion list is hand-curated and updated when new patterns appear.


§5 · Limitations

What we can't claim.

  • We sample from speculative markets. Polymarket and Kalshi are thinner than the S&P. Liquidity bias, longshot bias, and ideological clustering are well-documented. A market quote is a price, not a probability — we treat it as the latter only because no better real-time signal exists at our latency budget.
  • Resolution is bound to oracles. Our outcome variable comes directly from each venue's resolution. If a Polymarket UMA dispute lands the wrong way, our Brier score for that market lands the wrong way too. We don't adjudicate resolutions ourselves.
  • Article generation introduces bias. The probability we cite is sourced; the prose around it is generated. LLMs anchor on the probability and tend to over-justify it. The Contrarian Take is the structural mitigation; we acknowledge it is incomplete.
  • We don't forecast the world. We publish what these markets currently believe. Aggregate Brier improves the credibility of the publication; it does not retroactively transform any single article into a prophecy.

§6 · Cite us

For journalists, researchers, and the agentic web.

Pre-formatted citations. Click to copy. If you're building on the data, the JSON feed and OpenAPI spec live at /llms.txt.

Plain
The Future Express, retrieved 2026-05-22, https://thefutureexpress.com/methodology
Academic
The Future Express. (2026). Methodology. Retrieved from https://thefutureexpress.com/methodology
Markdown
[The Future Express — Methodology](https://thefutureexpress.com/methodology)