Calibration

Probability vs Reality

When the model says 60%, does the bet actually win 60% of the time? Below: each point is a decile bucket. Points on the diagonal = perfectly calibrated.

ECE

4.39%

weighted bucket gap · <5% is good

Brier Score

0.2089

MSE · <0.25 beats coin flip

Settled Bets

262

contributing to calibration

Reliability Diagram

Predicted probability → actual win rate

Per-bucket breakdown

Prob bucket	Predicted	Actual	Gap	n
10–20%	14.9%	16.7%	-1.7pp	12
20–30%	25.7%	13.8%	+11.9pp	29
30–40%	35.4%	33.3%	+2.0pp	27
40–50%	44.7%	56.3%	-11.6pp	32
50–60%	55.6%	52.3%	+3.4pp	44
60–70%	64.2%	63.8%	+0.4pp	58
70–80%	74.6%	76.2%	-1.6pp	42
80–90%	83.5%	86.7%	-3.2pp	15
90–100%	92.5%	66.7%	+25.8pp	3