Overview

Meta-learning is to learn how to learn. In optimization-based meta-learning, there is an extremely popular algorithm called Model-Agnostic Meta-Learning (MAML). In a team of three, I spent ten weeks researching the role of adaptation learning rate in MAMLs. We mathematically derived a theoretical interpretation, and designed a novel Protonet-style classification algorithm to empirically analyze the latent representations. We ended up landing 100% on the project (as sophomores) in a class with mostly graduate students.

I contributed to all aspects of the project, whether it was ideation, implementation, or visualization.

Timeline
Sep - Dec 2022

Skills
Research

Team
Ryan Lian
Victor Kolev
Sri Jaladi

Tools
PyTorch
Higher
Scikit-learn
Matplotlib
‍

Context

MAML is a tremendously popular meta-learning algorithm that involves a bi-level optimization problem (inner and outer loop). Although this enables for its flexibility for a wide range of use cases and effectiveness, it makes the algorithm even harder to interpret and understand compared to typical learning algorithms. Motivated by the mysterious phenomenon that the optimal inner learning rate can actually be negative, we wanted to investigate and interpret the role of the inner learning rate in the MAML algorithm. Given the compute limits we had, we mainly used a classic meta-learning set-up with the Mini-Imagenet dataset (more details in report).

Given the current literature, how can we interpret the inner learning rate of MAML that explains the mysterious negative phenomenon?

Process

Theoretical Derivation

I helped connecting the MAML objective to gradient penalty regularization commonly seen in GANs
‍

Implementation

I implemented and trained MAML with MiniImagenet benchmark in Pytorch (using Higher) on Microsoft Azure

Algorithm

Design and implemented Protonet-inspired clustering classification algorithm that directly fetches internal representations to make predictions and set up hypothesized experiments

Visualization

I visualized the representation clusters fetched during the algorithm using t-SNE

Presentation

I presented at the poster session with 200+ students and and faculty members

Outcome

(Read the paper on the cover for a detailed explanation)

Empirical Results

We discovered that at meta-test time, MAML merely acts as simple logistic regression mapping cluster centers of global representations to the specific labels, leaving the inner learning rate somewhat irrelevant.

Theoretical Results

We showed that at meta-training time, the MAML objective involves a term similar to gradient penalty regularization seen in GANs, and the inner learning rate then acts as a regularization constant, which explains the negativity.

Award

We received a perfect score on this 10-week long custom research project in CS 330: Deep Multi-task and Meta-Learning in a class full of graduate students.

Reflections & Takeaways

Complicated models can be understood and broken down! Teamwork is exciting no matter the different backgrounds involved, and there is always a perspective to contribute. Also, the more you play around with Pytorch the more sense it makes

Analyzing Latent Representations of
Trained MAMLS

Interpreting a popular meta-learning algorithm via novel methods

Overview

Context

Given the current literature, how can we interpret the inner learning rate of MAML that explains the mysterious negative phenomenon?