// METHODOLOGY

How XDRating Works

XDRating is a community-powered ranking system built on blind comparisons. Every vote feeds three independent scores that capture different dimensions of model performance.

🏆
XD
Overall Score
Source: Composite of Q + V + S

The XD score is the single number that answers "how good is this model, all things considered?" It blends quality, value, and stickiness into one ranking.

XD combines all three signals with weighted importance — quality and value contribute equally as the primary factors, with stickiness as a secondary signal. This means a model needs to be both good AND reasonably priced to rank high overall.

⚔️
XDR-Q
Quality
Source: Blind vote (vote 1)

Pure quality signal. Before any price information is shown, users vote on which model produced the better output. This captures raw preference uncontaminated by cost anchoring.

Every XDuel blind vote and XCreate selection contributes to XDR-Q. Models are compared pairwise — each matchup produces a winner, loser, or tie. The rating reflects how often a model wins head-to-head matchups across the community.

💰
XDR-V
Value
Source: Informed vote (vote 2)

Quality adjusted for price. After prices are revealed, users vote again. This second vote captures the real decision people face: is the better model worth the price difference?

XDR-V is derived from post-reveal votes where users can see both model quality and cost. A model that wins informed votes despite being expensive has strong Value. A cheap model that gains votes after price reveal has even stronger Value.

🧲
XDR-S
Stickiness
Source: Vote retention

How often a model keeps its vote after the price reveal. If you picked Model A blind and still picked it knowing prices — that model is sticky.

Stickiness measures the gap between perceived quality and price sensitivity. A model with high Stickiness is genuinely valued by users regardless of cost. Low Stickiness means the model's appeal drops once people see the bill.

Data Collection

Every XDuel generates two votes from the same user on the same matchup. The difference between those votes is the signal.

01
Prompt
User enters a prompt. Two random models are selected.
02
Blind Vote
Both models respond anonymously. User picks the better output.
XDR-Q
03
Price Reveal
Per-token and per-image costs are shown on each card.
04
Informed Vote
User votes again with full price context. Did knowing the cost change anything?
XDR-V + XDR-S
05
Reveal
Model identities unmasked. Savings calculated. Data logged.

Why Three Rankings?

Benchmarks measure capability. Chatbot Arena measures preference. Neither tells you what matters most: is this model worth the money for your use case?

By collecting two votes per matchup — one blind, one price-informed — we can separate pure quality judgment from price-adjusted value. The gap between those two votes reveals Stickiness: the models people genuinely prefer regardless of cost.

A model with high XDR-Q but low XDR-V is impressive but overpriced. A model with moderate XDR-Q but high XDR-V is a hidden gem. A model with high XDR-S is the real deal — people pick it and don't look back.

Every vote makes the rankings smarter

Start XDuel →View XDRating →