Rank
Creator
model
Bradley-Terry
Elo
Wins
Matches
1
ai logo
OpenAI
4o-26-3-25
1223.061069.41154132280080
2
ai logo
Google DeepMind
imagen-4.0-ultra-generate-exp-05-20
1123.811038.61113833213662
3
ai logo
Black Forest Labs
flux-1-pro
1115.331035.919548221782245
4
ai logo
Hidream AI
hidream-I1-full
1104.851032.23119781228968
5
ai logo
Black Forest Labs
flux-1.1-pro
1068.691020.316083421176689
6
ai logo
Reve AI
halfmoon-4-4-25
1025.721005.40216092427028
7
ai logo
Recraft
recraft-v2
1023.971004.785542331088852
8
ai logo
Ideogram
ideogram
1023.871004.77136179271093
9
ai logo
xAI
aurora-20-1-25
1005.28998.05387139775560
10
ai logo
Google DeepMind
imagen-3
1003.17997.275722741157702
11
ai logo
OpenGVLab
lumina-17-2-25
995.47994.60174504347832
12
ai logo
Runway
frames-23-1-25
973.01986.17328147668309
13
ai logo
OpenAI
dalle-3
940.40973.737930281631331
14
ai logo
Stability AI
stable-diffusion-3
922.19966.616423331332290
15
ai logo
Midjourney
midjourney-5.2
887.12952.497105181515232
16
ai logo
DeepSeek
janus-7b
810.12919.66125049285923

What is "Bradley-Terry"?

The Bradley-Terry ranking model is a probabilistic model used to predict outcomes in pairwise comparisons. It assigns a strength parameter (reported score) to each item, indicating its likelihood of winning against another. See the wikipedia article for mathematical details.

What do we consider as "Overall preference"?

Here we evaluate the model across all criteria and determine which model has the best overall performance.

All results are directly based on feedback from real human raters. The process of how we came out with results is best described in our blog post.

Examples

Visual examples of the annotators’ preferences

Preference
Which image looks better overall?
flux-1-pro_winner
FLUX.1 [pro]
flux-1-pro_winner
Midjourney
Coherence
Which image feels less weird or unnatural for its style when you look closely? I.e. fewer odd or strange-looking objects or elements
flux-1-pro_winner
FLUX.1 [pro]
flux-1-pro_winner
Midjourney
Alignment
Which image is more aligned with and better adheres to the prompt:
A black and white picture of a white man singing a song
flux-1-pro_winner
FLUX.1 [pro]
flux-1-pro_winner
Midjourney