Freetext: How to label unknown classes
In the last post, we re-annotated the Animals-10 dataset using the given 10 categories, plus Something Else
. It turned out, quite a few images ended up in this mystery bucket. Today we are going to label them!
What to do with our bucket of mystery images?
We only know that the Something Else
images are unlikely to be one of the original 10 categories, but it is near impossible to set up a classification task to give users the option to annotate arbitrary animals.
But don't despair, Freetext is here to the rescue! It has the power to label anything.
What? How is that possible? From the user's perspective, it looks like this:
They can simply type any animal into the textbox, hence giving infinite possiblities.
Let's go
Setting this one up is super easy. First we create a client to interact with the Rapidata API and find all images in the directory:
You might be prompted to login if it's the first time you are calling rapidata.RapidataClient()
. After executing this cell, we have all images stored in image_paths
.
Now, we can simply run a Free Text Order on these images like this:
We collect five responses image by setting .responses(5)
. Also, the shortest animal name I can think of is ox, so I set the minimum characters to three with .minimum_characters(2)
.
Depending on how many images we gave, this might take a few minutes. We can check the progress with
Once the order has completed, we can fetch the results:
Results
Here are 10 sample responses (from various iamges):
- Es un lobo
- Passaro
- Slepi miš
- Donkey
- León
- ديك رومى
- ماعز
- 灰色的仓鼠
- Pavo real
- Schwein
Perhaps this is not the result you expected. As our platform operates globally, we get global responses. Our end goal is to know which image shows what animal. Hence we want for every image, a single english corresponding label. Luckily, OpenAI's ChatGPT knows plenty of vocabulary and can even correct small typos.
Here's how we can get english labels:
This gives us:
- Es un lobo | wolf
- Passaro | bird
- Slepi miš | dormouse
- Donkey | donkey
- León | lion
- ديك رومى | turkey
- ماعز | goat
- 灰色的仓鼠 | hamster
- Pavo real | peacock
- Schwein | pig
Much better!
Lastly, I grouped the translated responses by image and only took the ones where we had at least 3 voters in agreement. This resulted in 103 images categorized into 34 different classes. The labeled images can be found on HuggingFace.
A few examples of the labeled images: