Hi there, I’m Joseph.
I’m an aspiring Effective Altruist and Rationalist. I care deeply about humanity, our future, and all life. I love learning and have spent considerable time developing aptitudes in software engineering, communication and research.
Currently I’m participating in ARENA, a london-based research engineering accelerator where I’m learning how to work with Large Language Models like GPT and BERT. I resigned earlier this year after receiving an FTX Future Fund regrant to pursue long-termist work.
I joined start-up Mass Dynamics in August of 2020 as it’s first ever Data Scientist and non-founding science team member. I worked there as a Data Scientist for a Proteomics SaaS, where I built tools and interfaces in Python (and R), performed algorithmic research and published novel scientific tools.
The year before, I completed a double degree in Computational Biology and Statistics/Stochastic Processes at the University of Melbourne, while working as a research assistant in the Buckle Protein Engineering lab. During my time at the Buckle Protein Engineering lab I worked on:
- engineering viral proteases using Hidden Markov Models,
- understanding the structure/dynamics of antibodies (on which I have a joint first author paper
- understanding the structure/dynamics of and the SERPIN protein superfamily.
Lately I’ve been thinking about… (Last updated 17th of November)
In Effective Altruism/Rationality: I’ve been thinking about how we respond to bad things happening, and the subsequent choices that face us. Sometimes it’s important to accept reality (the territory) and others responses to it (like maybe prioritising x-risks over other cause areas.) This cab ve frustrating, in a seperate and visceral way that is distinct from the sadness that comes from a disappointing reality. I think thinking about this might help us coordinate better by achieving a mindset more suitable for assessing information impartially.
In Machine Learning: As part of ARENA, we’ve been building large language models (LLMs) such as GPT2 and BERT. I’ve been trying to understand why the cosine similarity matrix of the (learned) positional embeddings of GPT2 look the way they do:
The fixed positional embeddings based on the original attention is all you need paper would look like this:
The fixed positional embeddings have a dot-product that decays with distance which makes conceptual sense, but it appears that GPT2 has learned a less orthogonal embedding. I wonder if this is related to model capacity and will come back to this later. I’ve asked a variety of experienced researchers who weren’t sure.