Rapidata

Rank	Creator	model	Bradley-Terry	Elo	Wins	Matches
1	OpenAI	4o-26-3-25	1223.06	1069.41	154132	280080
2	Google DeepMind	imagen-4.0-ultra-generate-exp-05-20	1123.81	1038.61	113833	213662
3	Black Forest Labs	flux-1-pro	1115.33	1035.91	954822	1782245
4	Hidream AI	hidream-I1-full	1104.85	1032.23	119781	228968
5	Black Forest Labs	flux-1.1-pro	1068.69	1020.31	608342	1176689
6	Reve AI	halfmoon-4-4-25	1025.72	1005.40	216092	427028
7	Recraft	recraft-v2	1023.97	1004.78	554233	1088852
8	Ideogram	ideogram	1023.87	1004.77	136179	271093
9	xAI	aurora-20-1-25	1005.28	998.05	387139	775560
10	Google DeepMind	imagen-3	1003.17	997.27	572274	1157702
11	OpenGVLab	lumina-17-2-25	995.47	994.60	174504	347832
12	Runway	frames-23-1-25	973.01	986.17	328147	668309
13	OpenAI	dalle-3	940.40	973.73	793028	1631331
14	Stability AI	stable-diffusion-3	922.19	966.61	642333	1332290
15	Midjourney	midjourney-5.2	887.12	952.49	710518	1515232
16	DeepSeek	janus-7b	810.12	919.66	125049	285923

What is "Bradley-Terry"?

The Bradley-Terry ranking model is a probabilistic model used to predict outcomes in pairwise comparisons. It assigns a strength parameter (reported score) to each item, indicating its likelihood of winning against another. See the wikipedia article for mathematical details.

What do we consider as "Overall preference"?

Here we evaluate the model across all criteria and determine which model has the best overall performance.

All results are directly based on feedback from real human raters. The process of how we came out with results is best described in our blog post.

What is "Bradley-Terry"?

What do we consider as "Overall preference"?

Examples

Want us to evaluate your model?