I have been working on Abstraction and Reasoning Challenge and I came up with this approach. I hope you find this useful in some way. If you want to run the prediction as a kaggle kernel, you can check out this notebook.
Learn a function
Meta-learning is the process of learning how to learn. A meta-learning algorithm takes in a distribution of tasks, where each task is a learning problem, and it produces a quick learner — a learner that can generalize from a small number of examples. MAML is one of the famous meta-learning approaches out there. But it requires us to compute Hessians (which is hard). Another interesting approach is the Reptile Algorithm. It's mathematically similar to first order MAML and performs a very similar update. The performance of reptile is similar to MAML on the Omniglot and Mini-ImageNet benchmarks. So, i decided to stick with Reptile instead of MAML.
The ARC Dataset is divided into training, evaluation and test sets. 100 examples from the evaluation set are part of the test set. I use these 100 tasks for validation. Each task can have 3 to 5 training images and images can vary in size across and within tasks from 2x2 to 30x30. The model weights are going to be shared among the tasks. So, we need a model that is independent of image size. So, i decided to make an extra class (11). This class represents non existent pixels for a task. So, let's say task size is 10, then i pad the image with class 1 such that image size is again 15. This has two advantages (hopefully)
- Allows us to have a common size across tasks (important if you want to experiment with cnn)
- Introduces the concept of emptiness and varying size to our model.
I experimented with cnn's but i found their performance to be lacking. In the current code, you can see that i am using transformer model. I reshape the image and pass it to the embedding layer first, followed by a positional encoding layer to account for the order of the pixel. I use Pytorch's implementation of TransformerEncoder. The output of TransformerEncoder model is sent to the final Linear layer, which gives us logits for each class (we have total 11 classes). Then we use the usual cross entropy for model training.
Since we are learning an abstract representation of cognitive priors, during the model training we need an outer loop which iterates over all the tasks. We train a transformer model for each task using the task specific inputs and outputs. This is our inner loop. Then we perform a gradient update to interpolate between current model weights and trained weights from this task. I found this picture from MAML paper very useful to get an intuition about
- You can use arc_transformer.py to perform training over all the tasks.
- transformer_validation.py can be used to evaluate the performance on our validation set.
- transformer_preds.py can be used to make predictions on the test set.
I use the Transformer model. You can easily replace the model with any cnn or deep learning model that you desire. I just wanted to introduce the idea of meta-learning and how it may be pertinent to the ARC dataset.