Editorial Standards · No. III

How The Future Express
Measures Accuracy

We publish twenty-four articles a day. Each predicts a market outcome at a stated probability. Markets eventually resolve. We compare. Here is the math.

Updated 2026-07-12 · Resolved predictions: 0 · Active markets tracked: 580

§1 · The Score

Aggregate Brier Score

—.———

Score available after the first market resolves. Until then, we publish, log, and wait.

●───────────────────────────────────────●

Perfect · 0.000Future ExpressRandom · 0.250

Brier score is the squared error between published probability and observed outcome. Zero is perfect. 0.25 is the score a coin would get. Tetlock's superforecasters land between 0.10 and 0.15 on geopolitical questions.

The math

brier_per_market = (forecast_probability − outcome)²
mean_brier       = sum(brier_per_market) / count(resolved)

forecast_probability = articles.probabilityAtPublish / 100
outcome              = markets.resolutionOutcome === "yes" ? 1 : 0

§2 · Calibration

Are 70% predictions resolving 70% of the time?

For each probability bucket, we plot the share of markets that actually resolved YES. A perfectly calibrated forecaster sits on the 45° line. Buckets with fewer than 5 resolved predictions are rendered as hatched bars and excluded from inference.

Bucket	Mean forecast	Empirical	Status
0–10%	—	—	No data
10–20%	—	—	No data
20–30%	—	—	No data
30–40%	—	—	No data
40–50%	—	—	No data
50–60%	—	—	No data
60–70%	—	—	No data
70–80%	—	—	No data
80–90%	—	—	No data
90–100%	—	—	No data

§3 · Track record by desk

Where we're sharp, where we're soft.

Brier is a strictly proper scoring rule, so smaller is always better. Sports markets generally resolve cleaner than politics; entertainment is dominated by award shows; crypto is volatile but oracle-clean.

Desk	Brier	Best call	Worst call
Politics	—	—	—
Economy	—	—	—
Crypto	—	—	—
Sports	—	—	—
Science	—	—	—
Entertainment	—	—	—
World	—	—	—

No resolved predictions yet. The table will populate as Polymarket and Kalshi markets settle.

§4 · How we generate articles

From order book to broadsheet.

Sources. Every probability we publish is a live read from one of two prediction-market venues:Polymarket(USDC, Polygon-settled) andKalshi(CFTC-regulated, USD). When both venues quote the same question, we report the volume-weighted blend and disclose the spread. Order-book data is pulled hourly via their public Gamma and v2 APIs.

Model stack. Article prose is drafted by a multi-provider fallback chain configured by LLM_PROVIDER_PRIORITY. In production order: Anthropic Claude Sonnet 4.6, OpenRouter (arcee-ai/trinity-mini), OpenAI gpt-4o-mini, and the 0G Compute Network (llama-3.3) as a decentralized fallback. If a provider 5xx's, the next one picks up the same prompt — no silent degradation, no cached fakes. Image illustrations use sourceful/riverflow-v2-fast with a 1920s halftone style transfer. Every article carries the model name in its FILED line.

Web research. Before a draft is written, the editor pulls 5–10 corroborating sources via Tavily (semantic search over recent news) and Brave Search (general web fallback). Snippets are passed into the prompt as context; URLs are stored on the article record and rendered in the "Sources" rail. We do not clip paywalled bodies, only headline + abstract + URL.

The contrarian take. Every article ships with a Contrarian Take field — a separate generation that argues the case against the market's implied direction. This is not editorial cosplay; it's a hedge against the well-documented bias of LLMs to launder consensus into prose. If the market says 78% YES, the contrarian paragraph argues why the 22% NO is the sharper bet. Finance pros tell us this is the field they read first.

What we don't cover. We exclude markets that incentivize harm: assassination markets, markets on individuals' deaths, doxxing-adjacent markets, and any market whose resolution criterion would reward violence. We also skip markets with an obvious oracle-capture problem (e.g. "will I tweet X by Friday" from a market creator's own account). The exclusion list is hand-curated and updated when new patterns appear.

§5 · Limitations

What we can't claim.

We sample from speculative markets. Polymarket and Kalshi are thinner than the S&P. Liquidity bias, longshot bias, and ideological clustering are well-documented. A market quote is a price, not a probability — we treat it as the latter only because no better real-time signal exists at our latency budget.
Resolution is bound to oracles. Our outcome variable comes directly from each venue's resolution. If a Polymarket UMA dispute lands the wrong way, our Brier score for that market lands the wrong way too. We don't adjudicate resolutions ourselves.
Article generation introduces bias. The probability we cite is sourced; the prose around it is generated. LLMs anchor on the probability and tend to over-justify it. The Contrarian Take is the structural mitigation; we acknowledge it is incomplete.
We don't forecast the world. We publish what these markets currently believe. Aggregate Brier improves the credibility of the publication; it does not retroactively transform any single article into a prophecy.

§6 · Cite us

For journalists, researchers, and the agentic web.

Pre-formatted citations. Click to copy. If you're building on the data, the JSON feed and OpenAPI spec live at /llms.txt.

Plain

The Future Express, retrieved 2026-07-12, https://thefutureexpress.com/methodology

Academic

The Future Express. (2026). Methodology. Retrieved from https://thefutureexpress.com/methodology

Markdown