Scoring Methodology
In short:
DAATAN ranks forecasters with eleven scoring systems, each measuring a different facet of skill: raw accuracy, calibration (Brier Score), head-to-head strength (ELO), and uncertainty-adjusted skill (Glicko-2) chief among them. Every system except Reputation Score can be filtered to a single topic via the leaderboard's ?tag= parameter. This page defines each one, gives its formula, and works a numeric example end to end.
Brier Score — calibration
The Brier Score measures how well a forecaster's stated confidence matches reality. It is the squared distance between the probability a user assigned and the actual outcome. Lower is better — a perfect forecaster scores 0.
brierScore = (probability − outcome)²
Worked example
A user commits at 75% confidence that a forecast will resolve "yes." It does.
| Outcome | Calculation | Brier Score |
|---|---|---|
| Resolves "yes" (as predicted) | (0.75 − 1)² | 0.0625 |
| Resolves "no" (against the prediction) | (0.75 − 0)² | 0.5625 |
Confident-and-wrong is penalized nine times harder than confident-and-right here — Brier Score punishes overconfidence quadratically, which is what forces honest use of the confidence slider instead of everyone claiming 99%.
ELO Rating — head-to-head strength
When two users commit to the same forecast, the one with the lower Brier Score (closer to the truth) takes ELO from the other — the same rating system used in chess. Higher is better.
expected_A = 1 / (1 + 10^((elo_B − elo_A) / 400)) delta_A = K × (actual_A − expected_A) // K = 32 actual_A = 1 if brier_A < brier_B, 0 if worse, 0.5 if tied
Worked example
User A (ELO 1500) and User B (ELO 1600) both commit to the same forecast. A's call turns out closer to the truth.
| Step | Value |
|---|---|
| A's expected score (as the lower-rated player) | 0.360 |
| A wins → ELO change | +20.5 → 1520.5 |
| A loses instead → ELO change | −11.5 → 1488.5 |
Beating a higher-rated opponent earns more than beating an equally-rated one; losing to a higher-rated opponent costs less. ELO is stored globally on every user and, per-tag, in a materialized table seeded the first time a topic's leaderboard is requested.
Glicko-2 — uncertainty-aware skill
Glicko-2 is ELO's more cautious cousin: it tracks not just a skill estimate (μ) but the system's uncertainty about that estimate (σ). The leaderboard ranks by μ − 3σ — a conservative floor, not the raw estimate — so a single lucky call can never outrank a high-volume, consistently accurate forecaster. Every user starts at μ=1500, σ=350.
Worked example
A brand-new user (μ=1500, σ=350) makes a confident, correct forecast (Brier Score 0.04). The system updates against a fixed social-consensus baseline:
| After | μ | σ | Rank (μ − 3σ) |
|---|---|---|---|
| 1 confident correct call | 1649 | 290 | 778 |
| 2 confident correct calls | 1717 | 256 | 950 |
| 1 confident wrong call instead | 1399 | 290 | 528 |
Notice the rank floor after one great call (778) is still below the 1500 starting point — σ is still wide, so the system withholds trust until the pattern repeats. That gap closes as σ shrinks with more resolved forecasts, which is exactly the guarantee Glicko-2 is built to give: volume and consistency beat a single lucky guess. (Reference: Glickman, 2012.)
Every other scoring system
DAATAN's remaining eight systems each isolate one specific facet of forecasting skill — participation volume, market-beating alpha, or recency-weighted consistency.
| System | What it measures | Formula | Scope |
|---|---|---|---|
| Reputation Score (RS) | The original DAATAN score. Earned by correct forecasts weighted by confidence (CU staked); wrong calls reduce it proportionally. | f(confidence, outcome, pool) at resolution | Global only |
| Accuracy | Share of resolved forecasts that came out right. | correct ÷ resolved × 100 | Tag-filtered |
| Most Correct | Raw count of correct resolved forecasts. | count(correct) | Tag-filtered |
| CU Committed | Total Confidence Units staked — a measure of conviction and participation volume. | Σ CU staked | Tag-filtered |
| Peer Score | How much more accurate a user's probability was than the community consensus at the moment they committed. | (community_p − outcome)² − (user_p − outcome)² | Tag-filtered |
| AI Score | How much more accurate a user's probability was than DAATAN's AI estimate at commit time. | (ai_p − outcome)² − (user_p − outcome)² | Tag-filtered |
| TruthScore | Average Peer Score per forecast — how consistently a user beats the crowd, independent of volume. Requires 3+ resolved forecasts. | peer_score_sum ÷ peer_score_count | Tag-filtered |
| ROI | Average net Reputation Score change per resolved forecast. Requires 3+ resolved forecasts. | Σ(rs_change) ÷ count | Tag-filtered |
| Weighted Peer Score | Peer Score with exponential time decay, so recent forecasts count more than old ones — a pattern borrowed from Metaculus. Requires 3+ resolved forecasts. | Σ(peerScore · 0.95^(days/30)) ÷ Σ(0.95^(days/30)) | Tag-filtered |
All eleven systems, side by side
| System | Leaderboard key | Sort direction | Scope |
|---|---|---|---|
| Brier Score | brierScore | Lower is better | Tag-filtered |
| ELO Rating | elo | Higher is better | Global + per-tag |
| Glicko-2 | glicko | Higher is better | Global + per-tag |
| Reputation Score (RS) | rs | Higher is better | Global only |
| Accuracy | accuracy | Higher is better | Tag-filtered |
| Most Correct | totalCorrect | Higher is better | Tag-filtered |
| CU Committed | cuCommitted | Higher is better | Tag-filtered |
| Peer Score | peerScore | Higher is better | Tag-filtered |
| AI Score | aiScore | Higher is better | Tag-filtered |
| TruthScore | truthScore | Higher is better | Tag-filtered |
| ROI | roi | Higher is better | Tag-filtered |
| Weighted Peer Score | weightedPeerScore | Higher is better | Tag-filtered |
Per-tag scores are computed within a single topic only — "if this were the only subject you ever forecast, how skilled would you be?" See the leaderboard to filter any of these by topic.
Last reviewed: July 2026.