Elana P. Simon

prof_pic.jpg

I’m excited about trying to understand what ML models are learning from biological sequences and structures - digging into their embeddings to find interpretable biological concepts, figuring out how they pick up these patterns during training, and seeing what biology we can uncover by reverse-engineering the models!

I’m doing a PhD at Stanford University, advised by James Zou. I also write ML-bio deep-dives on my blog (matmols), though these are pretty high-effort and thus low-frequency.

Previously I worked at Reverie Labs as an ML engineer helping design small molecule cancer drugs. As an undergrad, I studied Computer Science at Harvard and worked with Debora Marks on protein language models. I’ve also been quite involved with research, fundraising, and patient advocacy focused on Fibrolamellar Hepatocellular Carcinoma.

selected publications

  1. chimera.png
    Detection of a recurrent DNAJB1-PRKACA chimeric transcript in fibrolamellar hepatocellular carcinoma
    *Joshua N Honeyman, *Elana P Simon, Nicolas Robine, and 8 more authors
    Science, 2014
  2. nanobody.png
    Protein design and variant prediction using autoregressive generative models
    Jung-Eun Shin, Adam J Riesselman, Aaron W Kollasch, and 6 more authors
    Nature communications, Apr 2021
  3. chemberta.png
    Chemberta-2: Towards chemical foundation models
    *Walid Ahmad, *Elana Simon, Seyone Chithrananda, and 2 more authors
    ELLIS Machine Learning for Molecule Discovery Workshop, Dec 2021
  4. primer.png
    Language models for biological research: a primer
    *Elana Simon, *Kyle Swanson, and James Zou
    Nature Methods, Aug 2024
  5. interplm.png
    InterPLM: Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders
    Elana Simon, and James Zou
    bioRxiv, Nov 2024