XDRating is a community-powered ranking system built on blind comparisons. Every vote feeds three independent scores that capture different dimensions of model performance.
The XD score is the single number that answers "how good is this model, all things considered?" It blends quality, value, and stickiness into one ranking.
XD combines all three signals with weighted importance — quality and value contribute equally as the primary factors, with stickiness as a secondary signal. This means a model needs to be both good AND reasonably priced to rank high overall.
Pure quality signal. Before any price information is shown, users vote on which model produced the better output. This captures raw preference uncontaminated by cost anchoring.
Every XDuel blind vote and XCreate selection contributes to XDR-Q. Models are compared pairwise — each matchup produces a winner, loser, or tie. The rating reflects how often a model wins head-to-head matchups across the community.
Quality adjusted for price. After prices are revealed, users vote again. This second vote captures the real decision people face: is the better model worth the price difference?
XDR-V is derived from post-reveal votes where users can see both model quality and cost. A model that wins informed votes despite being expensive has strong Value. A cheap model that gains votes after price reveal has even stronger Value.
How often a model keeps its vote after the price reveal. If you picked Model A blind and still picked it knowing prices — that model is sticky.
Stickiness measures the gap between perceived quality and price sensitivity. A model with high Stickiness is genuinely valued by users regardless of cost. Low Stickiness means the model's appeal drops once people see the bill.
Every XDuel generates two votes from the same user on the same matchup. The difference between those votes is the signal.
Benchmarks measure capability. Chatbot Arena measures preference. Neither tells you what matters most: is this model worth the money for your use case?
By collecting two votes per matchup — one blind, one price-informed — we can separate pure quality judgment from price-adjusted value. The gap between those two votes reveals Stickiness: the models people genuinely prefer regardless of cost.
A model with high XDR-Q but low XDR-V is impressive but overpriced. A model with moderate XDR-Q but high XDR-V is a hidden gem. A model with high XDR-S is the real deal — people pick it and don't look back.
Every vote makes the rankings smarter