Scoring Methodology

In short:

DAATAN ranks forecasters with eleven scoring systems, each measuring a different facet of skill: raw accuracy, calibration (Brier Score), head-to-head strength (ELO), and uncertainty-adjusted skill (Glicko-2) chief among them. Every system except Reputation Score can be filtered to a single topic via the leaderboard's ?tag= parameter. This page defines each one, gives its formula, and works a numeric example end to end.

Brier Score — calibration

The Brier Score measures how well a forecaster's stated confidence matches reality. It is the squared distance between the probability a user assigned and the actual outcome. Lower is better — a perfect forecaster scores 0.

brierScore = (probability − outcome)²

Worked example

A user commits at 75% confidence that a forecast will resolve "yes." It does.

Outcome	Calculation	Brier Score
Resolves "yes" (as predicted)	(0.75 − 1)²	0.0625
Resolves "no" (against the prediction)	(0.75 − 0)²	0.5625

Confident-and-wrong is penalized nine times harder than confident-and-right here — Brier Score punishes overconfidence quadratically, which is what forces honest use of the confidence slider instead of everyone claiming 99%.

ELO Rating — head-to-head strength

When two users commit to the same forecast, the one with the lower Brier Score (closer to the truth) takes ELO from the other — the same rating system used in chess. Higher is better.

expected_A = 1 / (1 + 10^((elo_B − elo_A) / 400))
delta_A    = K × (actual_A − expected_A)     // K = 32
actual_A   = 1 if brier_A < brier_B, 0 if worse, 0.5 if tied

Worked example

User A (ELO 1500) and User B (ELO 1600) both commit to the same forecast. A's call turns out closer to the truth.

Step	Value
A's expected score (as the lower-rated player)	0.360
A wins → ELO change	+20.5 → 1520.5
A loses instead → ELO change	−11.5 → 1488.5

Beating a higher-rated opponent earns more than beating an equally-rated one; losing to a higher-rated opponent costs less. ELO is stored globally on every user and, per-tag, in a materialized table seeded the first time a topic's leaderboard is requested.

Glicko-2 — uncertainty-aware skill

Glicko-2 is ELO's more cautious cousin: it tracks not just a skill estimate (μ) but the system's uncertainty about that estimate (σ). The leaderboard ranks by μ − 3σ — a conservative floor, not the raw estimate — so a single lucky call can never outrank a high-volume, consistently accurate forecaster. Every user starts at μ=1500, σ=350.

Worked example

A brand-new user (μ=1500, σ=350) makes a confident, correct forecast (Brier Score 0.04). The system updates against a fixed social-consensus baseline:

After	μ	σ	Rank (μ − 3σ)
1 confident correct call	1649	290	778
2 confident correct calls	1717	256	950
1 confident wrong call instead	1399	290	528

Notice the rank floor after one great call (778) is still below the 1500 starting point — σ is still wide, so the system withholds trust until the pattern repeats. That gap closes as σ shrinks with more resolved forecasts, which is exactly the guarantee Glicko-2 is built to give: volume and consistency beat a single lucky guess. (Reference: Glickman, 2012.)

Every other scoring system

DAATAN's remaining eight systems each isolate one specific facet of forecasting skill — participation volume, market-beating alpha, or recency-weighted consistency.

System	What it measures	Formula	Scope
Reputation Score (RS)	The original DAATAN score. Earned by correct forecasts weighted by confidence (CU staked); wrong calls reduce it proportionally.	f(confidence, outcome, pool) at resolution	Global only
Accuracy	Share of resolved forecasts that came out right.	correct ÷ resolved × 100	Tag-filtered
Most Correct	Raw count of correct resolved forecasts.	count(correct)	Tag-filtered
CU Committed	Total Confidence Units staked — a measure of conviction and participation volume.	Σ CU staked	Tag-filtered
Peer Score	How much more accurate a user's probability was than the community consensus at the moment they committed.	(community_p − outcome)² − (user_p − outcome)²	Tag-filtered
AI Score	How much more accurate a user's probability was than DAATAN's AI estimate at commit time.	(ai_p − outcome)² − (user_p − outcome)²	Tag-filtered
TruthScore	Average Peer Score per forecast — how consistently a user beats the crowd, independent of volume. Requires 3+ resolved forecasts.	peer_score_sum ÷ peer_score_count	Tag-filtered
ROI	Average net Reputation Score change per resolved forecast. Requires 3+ resolved forecasts.	Σ(rs_change) ÷ count	Tag-filtered
Weighted Peer Score	Peer Score with exponential time decay, so recent forecasts count more than old ones — a pattern borrowed from Metaculus. Requires 3+ resolved forecasts.	Σ(peerScore · 0.95^(days/30)) ÷ Σ(0.95^(days/30))	Tag-filtered

All eleven systems, side by side

System	Leaderboard key	Sort direction	Scope
Brier Score	brierScore	Lower is better	Tag-filtered
ELO Rating	elo	Higher is better	Global + per-tag
Glicko-2	glicko	Higher is better	Global + per-tag
Reputation Score (RS)	rs	Higher is better	Global only
Accuracy	accuracy	Higher is better	Tag-filtered
Most Correct	totalCorrect	Higher is better	Tag-filtered
CU Committed	cuCommitted	Higher is better	Tag-filtered
Peer Score	peerScore	Higher is better	Tag-filtered
AI Score	aiScore	Higher is better	Tag-filtered
TruthScore	truthScore	Higher is better	Tag-filtered
ROI	roi	Higher is better	Tag-filtered
Weighted Peer Score	weightedPeerScore	Higher is better	Tag-filtered

Per-tag scores are computed within a single topic only — "if this were the only subject you ever forecast, how skilled would you be?" See the leaderboard to filter any of these by topic.

Last reviewed: July 2026.