In a previous blog post and paper we presented a benchmark for evaluating generative text-to-image models based on a collected large scale preference dataset consisting of more than 2 million responses from real humans. This large dataset was acquired in just a few days using Rapidata’s unique platform, and in this post we will show how you can easily set up and run the annotation process to collect a huge preference dataset yourself.
The Data
The preference dataset is made up of a large amount of pairwise comparisons between images generated from different models. For this demo, you can download a small dataset of images generated using Flux.1 [pro] and Stable Diffusion 3 Medium. The dataset is available on our huggingface page. The dataset contains the relevant images (images.zip) as well as a csv-file defining the matchups (matchups.csv).
Configuring and Starting the Annotation Process
For the annotation setup we will utilize the Rapidata API which can be easily used through our python package. To install the package run
If you are interested in learning more about the package, take a closer look at the documentation. However, this is not needed to follow along with this guide.
As a first step, import the necessary packages and create the client object that will be used in configuring the rest of the setup. When running this code for the first time it will open up a browser window, prompting you to log in. After successfully logging in, it will then save the credentials to ~/.config/rapidata/credentials.json for future use.
Now, let us import the data from the dataset. The csv is neatly formatted with each of the image pairs and their respective prompts, making this straightforward. The images and prompts are saved to later be used in the order. For demonstration purposes, I sample a subset of the pairs. Based on the settings presented in step 3, this should allow the necessary amount of responses to be collected in less than ten minutes. If you do not mind waiting, feel free to include more pairs.
The next step is to create the order using our API. In this case we specify the number of responses we want to be 15, and are adding a validation set, which ensures that the labelers understand the question. We have prepared a predefined validation set for this specific task, however these can also be customized if needed. Consult the documentation or reach out for more information in this regard.
So far we have not consulted any humans, by calling the .run() method on the order, we will start the annotation process.
Fetching the Results
Once the order has finished, you can easily get the results through the order object by calling the .get_results() method.
If the kernel has been restarted, you can find the order object again using the client.order.find_orders() method.
Analyzing the Results
The raw results come as a json object, however for analysis purposes we can extract it to a pandas dataframe using this utility function.
Expand to see utility function, get_df_from_results()
To find a winner between the two model, we e.g., look at which model got the most votes.
Visualization
The following function provides a simple visualization of the individual matchups and the votes they received, similar to the image shown below.
Expand to see utility function, plot_image_comparison()
Conclusion
Through this blog post you have seen how easily you can start collecting preference data from real humans through the Rapidata API with just a few lines of code. This guide serves as a starting point and now you are ready to customize the setup to your specific needs. If you have any questions or need help, feel free to reach out to us at info@rapidata.ai.