Overview

I worked with two teammates to investigate the depth properties of Deep Equilibrium Models (DEQs), a new class of implicit DL sequential models that mimic "infinite" layer neural networks. I had zero prior ML experience and completed this initial research project in only 7 weeks, including literature review, learning PyTorch, Google Cloud Compute, Git, and more. The task I designed involved classifying symbolic arithmetic expressions allowed us to test the depth properties of DEQs. Our project was awarded "Best Project" in the hallmark graduate level course (my team was all freshmen), demonstrating my proficiency in DL sequential models and ability to learn, pick up, and creatively manipulate dense technical concepts and tools.

Timeline
Mar - Jun 2022

Skills
Research

Team
Ryan Lian
Sarah Fujimori
Aditi Talati
‍

Tools
PyTorch
Scikit-learn
GCP
Git

Context

Unlike traditional neural networks, which stack many layers of weights and nonlinearities, DEQs model the solution of a fixed-point equation that is a continuous analogue of residual networks. In other words, DEQs models a network where a “layer” is applied countless times until convergence, and thus an “infinite” layer network. These models have the potential to outperform traditional networks on tasks that require a deeper architecture, but most recent works have been with larger scale applications. Due to the compute limitations, we had to get creative to design a smaller scale experiment that still highlighted DEQs advantages and would allow us to explore further.

How can we investigate infinite depth property of DEQs on a smaller scale?

Coming into this project, I have never really touched machine learning or sequential modeling, PyTorch, Git, Google Cloud Compute (even the shell commands), Scikit-learn, and more. The class also did not teach us deep learning and even RNNs, so I had to research and learn EVERYTHING along the way. I also did not have an extensive CS background.

Everything was so uncertain for a long time and my friends all told me this was a mistake, but I just loved the content so much I clung on to finish this project!

Action

Custom Task

I designed a novel task classifying symbolic arithmetic expressions. By randomly generating these expressions with a fixed length, I hoped to create expressions with a wide range of complexity—the number of nested expressions (number of parentheses) would create a variety of complex nested sequential structures that the DEQs can hopefully better capture.
‍

Implementation

Using PyTorch, we implemented and tuned the DEQ model with various architectures, including GRU layer, layered GRU, and MLP.

Visualization & Analysis

We used PCA through Scikit-learn and Matplotlib to analyze and visualize the train and test data, and compared the models based on accuracy, recall, and precision.

Outcome

(Read the paper on the cover for a detailed explanation)

Results

We discovered that DEQs do better job at capturing these variable, complex nested structures, out-performing other methods implemented by 2.6%

New Modeling Potential

What was more interesting was a wider modeling potential we discovered. Theoretically speaking, this method should not work if the layer doesn’t converge to a fixed point (the math would not make any sense), but we empirically discovered that it does still work. We interpreted this through the root-solver, and provided empirical results showing this phenomenon.

Award

We were happy to receive the Best Project award in CS 229: Machine Learning out of 250+ students. This is a hallmark graduate level course with many Phd and Masters students, and I was a first-year undergrad with no prior experience.

Reflections & Takeaways

Anything is possible! I feel confident picking up new technologies and learning new things even without a tremendous amount of prior background. It also opened the door to how exciting AI research could be, and seeing this come to 0 to 100 expanded my perception of what I’m capable of. I never grew up building anything so I had insecurities with my engineering abilities, but this project gave me more confidence.

Investigating Depth Properties of
Deep Equilibrium Models

Researching a novel class of implicit neural models

Overview

Context

How can we investigate infinite depth property of DEQs on a smaller scale?