Projects - AI Alignment

Sparse AutoEncoders

SAE Feature Image

Sparse autoencoders (SAEs) are an unsupervised technique used to decompose a model’s activations into interpretable features vectors. I recently published a set of Sparse Autoencoders for the Residual Stream of GPT2 Small here. You can browse these features on Neuronpedia.

My research into Sparse Autoencoders is currently being supervised by Neel Nanda at theMATS Program.

Decision Transformer Interpretability

In this project I apply the mathematical framework for transformer circuits to Decision Transformers, a reinforcement learning method designed to produced AI which can simulate players of an arbitrary quality. This project helped me gain a deeper understanding of many mechanistic interpretability techniques, many of the nuances of studying circuits and looking for goal representations inside neural networks.

You can find an initial write up here. The main github repo is for the project is here. I published an update with some findings here which I then applied to understanding Spelling in GPT-J here.

ARENA

ARENA (Alignment Research ENgineering Accelerator) was 9 week research engineering accelerator I participated in, during which we completed a series of increasingly sophisticated projects, culminating with my capstone on Decision Transformer Interpretability. These projects included:

Implementing GPT2 and training it to generate Shakespeare text.
FineTuning BERT
Implementing our own optimizers from scratch
Implementing DQN and PPO
Reverse Engineering small transformers performing algorithmic tasks

Projects - Computational Biology

I studied computational biology at university and have worked on a number of projects in this area. These include:

Mass Dynamics. I worked as a data scientist for 2 years producing 1 first author paper as well as several R packages. I also developed a novel algorithm for solving the protein-inference problem (see poster presentation below).
Buckle Protein Engineering Lab. I was a research assistant at the Buckle Protein engineering lab. I co-first-authored 1 paper on structural dynamics of immune proteins.

pi_poster