Elana P. Simon
I’m in the third year of my PhD at Stanford University, advised by James Zou, working on various projects at the intersection of machine learning and biology. Previously I worked at Reverie Labs as an ML engineer helping design small molecule cancer drugs. As an undergrad, I studied Computer Science at Harvard, worked with Debora Marks on protein language models. I also have been quite involved with research, fundraising, and patient advocacy focused on Fibrolamellar Hepatocellular Carcinoma.
Currently I’m very excited about trying to understand what ML models are actually learning from protein sequences and structures - digging into their embeddings to find interpretable biological concepts, figuring out how they pick up these patterns during training, and seeing what biology we can uncover by reverse-engineering the models!
I also aim to write up a bunch of ML-bio deep-dives in my blog (matols) but those are pretty high effort thus and low-frequency.
selected publications
- Detection of a recurrent DNAJB1-PRKACA chimeric transcript in fibrolamellar hepatocellular carcinomaScience, 2014
- Protein design and variant prediction using autoregressive generative modelsNature communications, Apr 2021
- Chemberta-2: Towards chemical foundation modelsELLIS Machine Learning for Molecule Discovery Workshop, Dec 2021
- Language models for biological research: a primerNature Methods, Aug 2024
- InterPLM: Discovering Interpretable Features in Protein Language Models via Sparse AutoencodersbioRxiv, Nov 2024