Projects - AI Alignment

Decision Transformer Interpretability

avatar

In this project I show that A Mathematical Framework for Transformer Circuits, a method used to understand how Large Language Models work, can be extended to Decision Transformers, a reinforcement learning method designed to produced AI which can simulate players of an arbitrary quality.

You can find an initial write up here. I’m currently working on this project and hope to publish more soon, including analysis of circuits in decision transforms which involve memory/language and variable goals.

ARENA

ARENA (Alignment Research ENgineering Accelerator) was 9 week research engineering accelerator I participated in, during which we completed a series of increasingly sophisticated projects, culminating with my capstone on Decision Transformer Interpretability. These projects included:

Projects - Computational Biology

I studied computational biology at university and have worked on a number of projects in this area. These include:

pi_poster