DAATAN

Scoring Methodology

In short:

DAATAN ranks forecasters with eleven scoring systems, each measuring a different facet of skill: raw accuracy, calibration (Brier Score), head-to-head strength (ELO), and uncertainty-adjusted skill (Glicko-2) chief among them. Every system except Reputation Score can be filtered to a single topic via the leaderboard's ?tag= parameter. This page defines each one, gives its formula, and works a numeric example end to end.

Brier Score — calibration

The Brier Score measures how well a forecaster's stated confidence matches reality. It is the squared distance between the probability a user assigned and the actual outcome. Lower is better — a perfect forecaster scores 0.

brierScore = (probability − outcome)²

Worked example

A user commits at 75% confidence that a forecast will resolve "yes." It does.

OutcomeCalculationBrier Score
Resolves "yes" (as predicted)(0.75 − 1)²0.0625
Resolves "no" (against the prediction)(0.75 − 0)²0.5625

Confident-and-wrong is penalized nine times harder than confident-and-right here — Brier Score punishes overconfidence quadratically, which is what forces honest use of the confidence slider instead of everyone claiming 99%.

ELO Rating — head-to-head strength

When two users commit to the same forecast, the one with the lower Brier Score (closer to the truth) takes ELO from the other — the same rating system used in chess. Higher is better.

expected_A = 1 / (1 + 10^((elo_B − elo_A) / 400))
delta_A    = K × (actual_A − expected_A)     // K = 32
actual_A   = 1 if brier_A < brier_B, 0 if worse, 0.5 if tied

Worked example

User A (ELO 1500) and User B (ELO 1600) both commit to the same forecast. A's call turns out closer to the truth.

StepValue
A's expected score (as the lower-rated player)0.360
A wins → ELO change+20.5 → 1520.5
A loses instead → ELO change−11.5 → 1488.5

Beating a higher-rated opponent earns more than beating an equally-rated one; losing to a higher-rated opponent costs less. ELO is stored globally on every user and, per-tag, in a materialized table seeded the first time a topic's leaderboard is requested.

Glicko-2 — uncertainty-aware skill

Glicko-2 is ELO's more cautious cousin: it tracks not just a skill estimate (μ) but the system's uncertainty about that estimate (σ). The leaderboard ranks by μ − 3σ — a conservative floor, not the raw estimate — so a single lucky call can never outrank a high-volume, consistently accurate forecaster. Every user starts at μ=1500, σ=350.

Worked example

A brand-new user (μ=1500, σ=350) makes a confident, correct forecast (Brier Score 0.04). The system updates against a fixed social-consensus baseline:

AfterμσRank (μ − 3σ)
1 confident correct call1649290778
2 confident correct calls1717256950
1 confident wrong call instead1399290528

Notice the rank floor after one great call (778) is still below the 1500 starting point — σ is still wide, so the system withholds trust until the pattern repeats. That gap closes as σ shrinks with more resolved forecasts, which is exactly the guarantee Glicko-2 is built to give: volume and consistency beat a single lucky guess. (Reference: Glickman, 2012.)

Every other scoring system

DAATAN's remaining eight systems each isolate one specific facet of forecasting skill — participation volume, market-beating alpha, or recency-weighted consistency.

SystemWhat it measuresFormulaScope
Reputation Score (RS)The original DAATAN score. Earned by correct forecasts weighted by confidence (CU staked); wrong calls reduce it proportionally.f(confidence, outcome, pool) at resolutionGlobal only
AccuracyShare of resolved forecasts that came out right.correct ÷ resolved × 100Tag-filtered
Most CorrectRaw count of correct resolved forecasts.count(correct)Tag-filtered
CU CommittedTotal Confidence Units staked — a measure of conviction and participation volume.Σ CU stakedTag-filtered
Peer ScoreHow much more accurate a user's probability was than the community consensus at the moment they committed.(community_p − outcome)² − (user_p − outcome)²Tag-filtered
AI ScoreHow much more accurate a user's probability was than DAATAN's AI estimate at commit time.(ai_p − outcome)² − (user_p − outcome)²Tag-filtered
TruthScoreAverage Peer Score per forecast — how consistently a user beats the crowd, independent of volume. Requires 3+ resolved forecasts.peer_score_sum ÷ peer_score_countTag-filtered
ROIAverage net Reputation Score change per resolved forecast. Requires 3+ resolved forecasts.Σ(rs_change) ÷ countTag-filtered
Weighted Peer ScorePeer Score with exponential time decay, so recent forecasts count more than old ones — a pattern borrowed from Metaculus. Requires 3+ resolved forecasts.Σ(peerScore · 0.95^(days/30)) ÷ Σ(0.95^(days/30))Tag-filtered

All eleven systems, side by side

SystemLeaderboard keySort directionScope
Brier ScorebrierScoreLower is betterTag-filtered
ELO RatingeloHigher is betterGlobal + per-tag
Glicko-2glickoHigher is betterGlobal + per-tag
Reputation Score (RS)rsHigher is betterGlobal only
AccuracyaccuracyHigher is betterTag-filtered
Most CorrecttotalCorrectHigher is betterTag-filtered
CU CommittedcuCommittedHigher is betterTag-filtered
Peer ScorepeerScoreHigher is betterTag-filtered
AI ScoreaiScoreHigher is betterTag-filtered
TruthScoretruthScoreHigher is betterTag-filtered
ROIroiHigher is betterTag-filtered
Weighted Peer ScoreweightedPeerScoreHigher is betterTag-filtered

Per-tag scores are computed within a single topic only — "if this were the only subject you ever forecast, how skilled would you be?" See the leaderboard to filter any of these by topic.

Last reviewed: July 2026.

Scoring Methodology — How DAATAN Measures Forecasting Skill | DAATAN