How The Future Express
Measures Accuracy
We publish twenty-four articles a day. Each predicts a market outcome at a stated probability. Markets eventually resolve. We compare. Here is the math.
Brier score is the squared error between published probability and observed outcome. Zero is perfect. 0.25 is the score a coin would get. Tetlock's superforecasters land between 0.10 and 0.15 on geopolitical questions.
The math
brier_per_market = (forecast_probability − outcome)² mean_brier = sum(brier_per_market) / count(resolved) forecast_probability = articles.probabilityAtPublish / 100 outcome = markets.resolutionOutcome === "yes" ? 1 : 0
Are 70% predictions resolving 70% of the time?
For each probability bucket, we plot the share of markets that actually resolved YES. A perfectly calibrated forecaster sits on the 45° line. Buckets with fewer than 5 resolved predictions are rendered as hatched bars and excluded from inference.
| Bucket | n | Mean forecast | Empirical | Status |
|---|---|---|---|---|
| 0–10% | 0 | — | — | No data |
| 10–20% | 0 | — | — | No data |
| 20–30% | 0 | — | — | No data |
| 30–40% | 0 | — | — | No data |
| 40–50% | 0 | — | — | No data |
| 50–60% | 0 | — | — | No data |
| 60–70% | 0 | — | — | No data |
| 70–80% | 0 | — | — | No data |
| 80–90% | 0 | — | — | No data |
| 90–100% | 0 | — | — | No data |
Where we're sharp, where we're soft.
Brier is a strictly proper scoring rule, so smaller is always better. Sports markets generally resolve cleaner than politics; entertainment is dominated by award shows; crypto is volatile but oracle-clean.
| Desk | n | Brier | Best call | Worst call |
|---|---|---|---|---|
| Politics | 0 | — | — | — |
| Economy | 0 | — | — | — |
| Crypto | 0 | — | — | — |
| Sports | 0 | — | — | — |
| Science | 0 | — | — | — |
| Entertainment | 0 | — | — | — |
| World | 0 | — | — | — |
No resolved predictions yet. The table will populate as Polymarket and Kalshi markets settle.
From order book to broadsheet.
Sources. Every probability we publish is a live read from one of two prediction-market venues:Polymarket(USDC, Polygon-settled) andKalshi(CFTC-regulated, USD). When both venues quote the same question, we report the volume-weighted blend and disclose the spread. Order-book data is pulled hourly via their public Gamma and v2 APIs.
Model stack. Article prose is drafted by a multi-provider fallback chain configured by LLM_PROVIDER_PRIORITY. In production order: Anthropic Claude Sonnet 4.6, OpenRouter (arcee-ai/trinity-mini), OpenAI gpt-4o-mini, and the 0G Compute Network (llama-3.3) as a decentralized fallback. If a provider 5xx's, the next one picks up the same prompt — no silent degradation, no cached fakes. Image illustrations use sourceful/riverflow-v2-fast with a 1920s halftone style transfer. Every article carries the model name in its FILED line.
Web research. Before a draft is written, the editor pulls 5–10 corroborating sources via Tavily (semantic search over recent news) and Brave Search (general web fallback). Snippets are passed into the prompt as context; URLs are stored on the article record and rendered in the "Sources" rail. We do not clip paywalled bodies, only headline + abstract + URL.
The contrarian take. Every article ships with a Contrarian Take field — a separate generation that argues the case against the market's implied direction. This is not editorial cosplay; it's a hedge against the well-documented bias of LLMs to launder consensus into prose. If the market says 78% YES, the contrarian paragraph argues why the 22% NO is the sharper bet. Finance pros tell us this is the field they read first.
What we don't cover. We exclude markets that incentivize harm: assassination markets, markets on individuals' deaths, doxxing-adjacent markets, and any market whose resolution criterion would reward violence. We also skip markets with an obvious oracle-capture problem (e.g. "will I tweet X by Friday" from a market creator's own account). The exclusion list is hand-curated and updated when new patterns appear.
What we can't claim.
- We sample from speculative markets. Polymarket and Kalshi are thinner than the S&P. Liquidity bias, longshot bias, and ideological clustering are well-documented. A market quote is a price, not a probability — we treat it as the latter only because no better real-time signal exists at our latency budget.
- Resolution is bound to oracles. Our outcome variable comes directly from each venue's resolution. If a Polymarket UMA dispute lands the wrong way, our Brier score for that market lands the wrong way too. We don't adjudicate resolutions ourselves.
- Article generation introduces bias. The probability we cite is sourced; the prose around it is generated. LLMs anchor on the probability and tend to over-justify it. The Contrarian Take is the structural mitigation; we acknowledge it is incomplete.
- We don't forecast the world. We publish what these markets currently believe. Aggregate Brier improves the credibility of the publication; it does not retroactively transform any single article into a prophecy.
For journalists, researchers, and the agentic web.
Pre-formatted citations. Click to copy. If you're building on the data, the JSON feed and OpenAPI spec live at /llms.txt.