Ava Amini

I am a Principal Researcher at Microsoft Research in Cambridge, MA.
My research focuses on developing new AI methods to understand and design biology, with the ultimate aim of realizing precision biomedicines to improve human health. To this end, I co-lead Project Ex Vivo, a collaborative effort between Microsoft and the Broad Institute, that is focused on defining, engineering, and targeting cell states in cancer.

I completed my PhD in Biophysics at Harvard University, where I worked with Sangeeta Bhatia at the Koch Institute for Integrative Cancer Research and was supported by the NSF Graduate Research Fellowship. I received my Bachelor of Science in Computer Science and Molecular Biology from MIT, where I was recognized as a Henry Ford II Scholar and with the AMITA Academic Award.

Research  /  Teaching  /  Talks

Email  /  CV  /  Google Scholar  /  Twitter

profile photo
Research and Publications

My research focuses on developing new AI methods to understand and design biology, with the ultimate aim of realizing precision biomedicines to improve human health. My work bridges the computational and experimental worlds to create new diagnostic and therapeutic biotechnologies and to achieve new insights into cancer.

Formerly Ava Soleimany. *Denotes co-first authorship. Denotes corresponding authors.
Hierarchical cross-entropy loss improves atlas-scale single-cell annotation models
Sebastiano Cultrera di Montesano, Davide D'Ascenzo, Srivatsan Raghavan, Ava P. Amini, Peter S. Winter, Lorin Crawford
bioRxiv, 2025
pdf / code

We introduce a hierarchical cross-entropy loss for single-cell cell type annotation, finding that this simple modification significantly improves out-of-distribution performance without added computational cost.

Zero-shot evaluation reveals limitations of single-cell foundation models
Kasia Z. Kedzierska, Lorin Crawford, Ava P. Amini, Alex X. Lu
Genome Biology, 2025
pdf / code

We evaluate the performance of single-cell foundation models in ``zero-shot`` settings where they are used without any further training, finding that they are outperformed by simpler methods.

ProtNote: a multimodal method for protein-function annotation
Samir Char, Nathaniel Corley, Sarah Alamdari, Kevin K. Yang, Ava P. Amini
Bioinformatics, 2025
code

ProtNote is a multimodal deep learning model that leverages free-form text to enable both supervised and zero-shot protein function prediction.

Causal integration of chemical structures improves representations of microscopy images for morphological profiling
Yemin Yu, Neil Tenenholtz, Lester Mackey, Ying Wei, David Alvarez-Melis, Ava P. Amini, Alex X. Lu
arXiv, 2025
pdf / code

We introduce a new deep learning framework, MICON, that models chemical compounds as treatments that induce counterfactual transformations of cell phenotypes for improved representation learning in microscopy-based morphological profiling.

Artificial variables help to avoid over-clustering in single-cell RNA sequencing
Alan DenAdel, Michelle L. Ramseier, Andrew W. Navia, Alex K. Shalek, Srivatsan Raghavan, Peter S. Winter, Ava P. Amini, Lorin Crawford
American Journal of Human Genetics, 2025
code

We develop a method for protecting against over-clustering in analysis of single-cell RNA sequencing data, by controlling for the impact of reusing the same data twice when performing differential expression analysis.

Deep learning guided design of protease substrates
Carmen Martin-Alonso*, Sarah Alamdari*, Tahoura S. Samad, Kevin K. Yang, Sangeeta N. Bhatia, Ava P. Amini
bioRxiv, 2025
pdf / code

We present CleaveNet, an end-to-end AI pipeline for the design of peptide-based protease substrates, enabling generation of peptides guided by a target cleavage profile for the design of efficient and selective substrates.

Toward deep learning sequence–structure co-generation for protein design
Chentong Wang, Sarah Alamdari, Carles Domingo-Enrich, Ava P. Amini, Kevin K. Yang
Current Opinion in Structural Biology, 2025

We review recent advances in deep generative models for protein design, with a particular focus on sequence-structure co-generation methods.

Consequences of training data composition for deep learning models in single-cell biology
Ajay Nadig, Akshaya Thoutam, Madeline Hughes, Anay Gupta, Andrew W. Navia, Nicolo Fusi, Srivatsan Raghavan, Peter S. Winter, Ava P. Amini, Lorin Crawford
bioRxiv, 2025
pdf / code

We systematically investigate the consequences of training dataset composition on the behavior of deep learning models of single-cell transcriptomics, focusing on human hematopoiesis as a tractable model system and including cells from adult and developing tissues, disease states, and perturbation atlases.

Benchmarking uncertainty quantification for protein engineering
Kevin P. Greenman, Ava P. Amini, Kevin K. Yang
PLOS Computational Biology, 2025
pdf / code

We assess deep learning-based uncertainty quantification methods on protein sequence-function prediction tasks.

Evaluating the role of pre-training dataset size and diversity on single-cell foundation model performance
Alan DenAdel, Madeline Hughes, Akshaya Thoutam, Anay Gupta, Andrew W. Navia, Nicolo Fusi, Srivatsan Raghavan, Peter S. Winter, Ava P. Amini, Lorin Crawford
bioRxiv, 2024
pdf / code

We investigate the role of pre-training dataset size and diversity on the performance of single-cell foundation models on both zero-shot and fine-tuned tasks, finding that current methods plateau in performance with pre-training datasets that are only a fraction of the full size.

Deeper evaluation of a single-cell foundation model
Rebecca Boiarsky, Nalini M. Singh, Alejandro Buendia, Ava P. Amini, Gad Getz, David Sontag
Nature Machine Intelligence, 2024
pdf / code

We take a deep dive into scBERT, one recently developed transformer model for single-cell RNA-sequencing data, to develop a deeper understanding of the potential benefits and limitations of single-cell foundation models.

Protein generation with evolutionary diffusion: sequence is all you need
Sarah Alamdari, Nitya Thakkar, Rianne van den Berg, Neil Tenenholtz, Robert Strome, Alan M. Moses, Alex X. Lu, Nicolo Fusi, Ava P. Amini, Kevin K. Yang
bioRxiv, 2024
pdf / code

We develop EvoDiff, a general-purpose discrete diffusion over protein sequences, that combines evolutionary-scale data with the distinct conditioning capabilities of diffusion models for controllable protein generation in sequence space.

Scalable, compressed phenotypic screening using pooled perturbations
Nuo Liu, Walaa E. Kattan, Benjamin E. Mead, Conner Kummerlowe, Thomas Cheng, Sarah Ingabire, Jamie H. Cheah, Christian K. Soule, Anita Vrcic, Jane K. McIninch, Sergio Triana, Manuel Guzman, Tyler T. Dao, Joshua M. Peters, Kristen E. Lowder, Lorin Crawford, Ava P. Amini, Paul C. Blainey, William C. Hahn, Brian Cleary, Bryan Bryson, Peter S. Winter, Srivatsan Raghavan, Alex K. Shalek
Nature Biotechnology, 2024
pdf

We establish a method of pooling perturbations, like chemical compounds, followed by computational deconvolution to reduce required sample size, labor, and cost in high-throughput phenotypic screens.

Mutation and cell state compatibility is required and targetable in Ph+ acute lymphoblastic leukemia minimal residual disease
Peter S. Winter*, Michelle L. Ramseier*, Andrew W. Navia*, Sachit Saksena, Haley Strouf, Nezha Senhaji, Alan DenAdel, Mahnoor Mirza, Hyun Hwan An, Laura Bilal, Peter Dennis, Catherine S. Leahy, Kay Shigemori, Jennyfer Galves-Reyes, Ye Zhang, Foster Powers, Nolawit Mulugeta, Alejandro J. Gupta, Nicholas Calistri, Alex Van Scoyk, Kristen Jones, Huiyun Liu, Kristen E. Stevenson, Siyang Ren, Marlise R. Luskin, Charles P. Couturier, Ava P. Amini, Srivatsan Raghavan, Robert J. Kimmerling, Mark M. Stevens, Lorin Crawford, David M. Weinstock, Scott R. Manalis, Alex K. Shalek, Mark A. Murakami
bioRxiv, 2024
pdf

Utilizing patient-derived xenograft (PDX) models and clinical trial specimens of acute lymphoblastic leukemia (ALL), we examined how genetic and transcriptional features co-evolve to drive progression during prolonged tyrosine kinase inhibitor response, uncovering a landscape of cooperative mutational and transcriptional escape mechanisms that differ from those causing resistance to first generation inhibitors.

Feature reuse and scaling: Understanding transfer learning with protein language models
Francesca-Zhoufan Li, Ava P. Amini, Yisong Yue, Kevin K. Yang, Alex X. Lu
International Conference on Machine Learning, 2024
pdf

To understand how the features learned in pretraining protein language models (PLMs) relate to and are useful for downstream tasks, we perform a systematic analysis of transfer learning using PLMs, conducting 370 experiments across a comprehensive suite of factors including different downstream tasks, architectures, model sizes, model depths, and pretraining time.

Scalable nonparametric clustering with unified marker gene selection for single-cell RNA-seq data
Chibuikem Nwizu, Madeline Hughes, Michelle L. Ramseier, Andrew W. Navia, Alex K. Shalek, Nicolo Fusi, Srivatsan Raghavan, Peter S. Winter, Ava P. Amini, Lorin Crawford
bioRxiv, 2024
pdf

We developed a non-parametric infinite mixture model that leverages Bayesian sparse priors to identify marker genes while simultaneously performing clustering on single-cell expression data.

Protein structure generation via folding diffusion
Kevin E. Wu, Kevin K. Yang, Rianne van den Berg, Sarah Alamdari, James Y. Zou, Alex X. Lu, Ava P. Amini
Nature Communications, 2024
pdf / code

We present FoldingDiff, a diffusion-based generative model that generates protein backbone structures via a procedure inspired by the natural folding process.

Priming agents transiently reduce the clearance of cell-free DNA to improve liquid biopsies
Carmen Martin-Alonso*, Shervin Tabrizi*, Kan Xiong*, Timothy Blewett, Sainetra Sridhar, Andjela Crnjac, Sahil Patel, Zhenyi An, Ahmet Bekdemir, Douglas Shea, Shih-Ting Wang, Sergio Rodriguez-Aponte, Christopher A. Naranjo, Justin Rhoades, Jesse D. Kirkpatrick, Heather E. Fleming, Ava P. Amini, Todd R. Golub, J. Christopher Love, Sangeeta N. Bhatia, Viktor A. Adalsteinsson
Science, 2024
pdf / MIT press / general press

We develop intravenous priming agents that are given prior to a blood draw to increase the abundance of cell-free DNA in circulation, improving the sensitivity of liquid biopsy cancer diagnostic assays.

Deep self-supervised learning for biosynthetic gene cluster detection and product classification
Carolina Rios-Martinez, Nicholas Bhattacharya, Ava P. Amini, Lorin Crawford, Kevin K. Yang
PLOS Computational Biology, 2023
pdf

We develop a self-supervised masked language model for biosynthetic gene clusters in bacteria, and leverage it for natural product classification.

From noise to protein with image models
Ava P. Amini, Kevin K. Yang
Nature Computational Science, 2023
pdf

A commentary on a recent image-inspired, diffusion model to generate new protein structures.

Low protease activity in B cell follicles promotes retention of intact antigens after immunization
Aereas Aung, Ang Cui, Laura Maiorino, Ava P. Amini, Justin R. Gregory, Maurice Bukenya, Yiming Zhang, Heya Lee, Christopher A. Cottrell, Duncan M. Morgan, Murillo Silva, Heikyung Suh, Jesse D. Kirkpatrick, Parastoo Amlashi, Tanaka Remba, Leah M. Froehle, Shuhao Xiao, Wuhbet Abraham, Josetta Adams, J. Christopher Love, Phillip Huyett, Douglas S. Kwon, Nir Hacohen, William R. Schief, Sangeeta N. Bhatia, Darrell J. Irvine
Science, 2023
pdf / MIT press / general press

We discover "sanctuaries" within lymph nodes that contain low proteolytic activity and act as a safe haven for vaccines, and demonstrate that this heterogeneity can be exploited to enhance vaccine-induced antibody response.

Multiscale profiling of protease activity in cancer
Ava P. Amini*, Jesse D. Kirkpatrick*, Cathy S. Wang, Alex M. Jaeger, Susan Su, Santiago Naranjo, Qian Zhong, Tyler Jacks, Sangeeta N. Bhatia
Nature Communications, 2022
pdf / supplement

We engineer an integrated set of methods for measuring specific protease activities across the organismal, tissue, and cellular scales, and unify these methods into a methodological hierarchy that powers new biological insights into cancer.

Protease Activity Analysis: A toolkit for analyzing enzyme activity data
Ava P. Soleimany*, Carmen Martin-Alonso*, Melodi Anahtar*, Cathy S. Wang, Sangeeta N. Bhatia
ACS Omega, 2022
pdf / code

We build Protease Activity Analysis (PAA), a Python software package with a collection of data analytic and machine learning tools for analyzing protease activity data.

Host protease activity classifies pneumonia etiology
Melodi Anahtar, Leslie W. Chan, Henry Ko, Aditya Rao, Ava P. Soleimany, Purvesh Khatri, Sangeeta N. Bhatia
PNAS, 2022
pdf / code / press

We develop a sensor-based, ML-driven system to diagnose pneumonia and classify its etiology, using machine learning to classify directly from molecular barcodes.

Protease activity sensors enable real-time treatment response monitoring in lymphangioleiomyomatosis
Jesse D. Kirkpatrick, Ava P. Soleimany, Jaideep S. Dudani, Heng-Jia Liu, Hilaire C. Lam, Carmen Priolo, Elizabeth P. Henske, Sangeeta N. Bhatia
European Respiratory Journal, 2022
pdf

We establish a sensor-based, ML-driven diagnostic for noninvasive, real-time monitoring of disease in a preclinical model of lymphangioleiomyomatosis (LAM), a rare lung disease.

Ionic liquid-mediated transdermal delivery of thrombosis-detecting nanosensors
Ahmet Bekdemir, Eden E.L. Tanner, Jesse D. Kirkpatrick, Ava P. Soleimany, Samir Mitragotri, Sangeeta N. Bhatia
Advanced Healthcare Materials, 2022
pdf / supplement

We design an easily applicable, non-invasive formulation to deliver diagnostic nanosensors through the skin, enabling a sustained release diagnostic monitoring system for detecting thrombosis.

Synthetic circuit-driven expression of heterologous enzymes for disease detection
Jiang He*, Lior Nissim*, Ava P. Soleimany*, Adina Binder-Nissim, Heather E. Fleming, Timothy K. Lu, Sangeeta N. Bhatia
ACS Synthetic Biology, 2021
pdf / supplement

We design a sense-and-respond system that integrates a synthetic gene circuit and nanotechnology detection tools for tumor-specific expression of heterologous biomarkers.

Evidential deep learning for guided molecular property prediction and discovery
Ava P. Soleimany*, Alexander Amini*, Samuel Goldman*, Daniela Rus, Sangeeta N. Bhatia, Connor W. Coley
ACS Central Science, 2021
pdf / supplement

A fast, scalable approach for uncertainty quantification in neural networks enables uncertainty-aware molecular property prediction, accelerated property optimization, and guided virtual screening.

Activatable zymography probes enable in situ localization of protease dysregulation in cancer
Ava P. Soleimany*, Jesse D. Kirkpatrick*, Susan Su, Jaideep S. Dudani, Qian Zhong, Ahmet Bekdemir, Sangeeta N. Bhatia
Cancer Research, 2021
pdf / supplement

We engineer a new class of enzyme activity probes that can be applied to fresh-frozen tissue sections to spatially localize protease activty, enabling new insights into the biology of protease dysregulation.

Deep evidential regression
Alexander Amini, Wilko Schwarting, Ava Soleimany, Daniela Rus
NeurIPS, 2020
pdf / press

We develop a novel algorithm for fast, scalable uncertainty quantification in highly complex, non-linear neural networks trained for regression tasks.

Pharmacokinetic tuning of protein-antigen fusions enhances the immunogenicity of T-cell vaccines
Naveen K. Mehta, Roma V. Pradhan, Ava P. Soleimany, Kelly D. Moynihan, Adrienne M. Rothschilds, Noor Momin, Kavya Rakhra, Jordi Mata-Fink, Sangeeta N. Bhatia, K. Dane Wittrup, Darrell J. Irvine
Nature Biomedical Engineering, 2020
pdf / supplement / press

We optimize the immunogenicity of peptide-based antitumor vaccines in mice by tuning their pharmacokinetics via fusion of the peptide epitopes to protein carriers.

Activity-based diagnostics: an emerging paradigm for disease detection and monitoring
Ava P. Soleimany, Sangeeta N. Bhatia
Trends in Molecular Medicine, 2020
pdf

Review detailing how integrating techniques from multiple disciplines has developed engineered diagnostics that are selectively activated in disease states, highlighting their potential to realize the goals of precision medicine.

Urinary detection of lung cancer in mice via noninvasive pulmonary protease profiling
Jesse D. Kirkpatrick*, Andrew D. Warren*, Ava P. Soleimany*, Peter M. K. Westcott, Justin C. Voog, Carmen Martin-Alonso, Heather E. Fleming, Tuomas Tammela, Tyler Jacks, Sangeeta N. Bhatia
Science Translational Medicine, 2020
supplement / press / video

We couple protease-responsive nanoparticle sensors with machine learning to engineer a sensitive and specific urinary test for lung cancer detection.

Genetic encoding of targeted MRI contrast agents for tumor imaging
Simone Schuerle, Maiko Furubayashi, Ava P. Soleimany, Tinotenda Gwisai, Wei Huang, Christopher A. Voigt, Sangeeta N. Bhatia
ACS Synthetic Biology, 2020
supplement

Magnetic nanoparticles that display genetically encoded targeting peptides to promote tumor accumulation and enhance MRI contrast.

Renal clearable catalytic gold nanoclusters for in vivo disease monitoring
Colleen Loynachan*, Ava P. Soleimany*, Jaideep S. Dudani, Yiyang Lin, Adrian Najer, Ahmet Bekdemir, Qu Chen, Sangeeta N. Bhatia, Molly M. Stevens
Nature Nanotechnology, 2019
supplement / data / press / video

By leveraging the unique properties of catalytic nanomaterials, we develop a simple color-change urine test for detection of cancer in mice.

Image segmentation of liver stage malaria infection with spatial uncertainty sampling
Ava P. Soleimany, Harini Suresh, Jose Javier Gonzalez Ortiz, Divya Shanmugam, Nil Gural, John Guttag, Sangeeta N. Bhatia
ICML Workshop on Computational Biology, 2019

Convolutional neural networks for automated segmentation and uncertainty estimation of microscopy images of malaria infection.

Synthetic and living micropropellers for convection-enhanced nanoparticle transport
Simone Schuerle, Ava P. Soleimany, Tiffany Yeh, Giridhar M. Anand, Moritz Haberli, Heather E. Fleming, Nima Mirkhani, Famin Qiu, Sabine Hauert, Xiaopu Wang, Bradley J. Nelson, Sangeeta N. Bhatia
Science Advances, 2019
supplement / press / video

Engineered microrobots that use magnetism to push drug-delivery nanoparticles out of blood vessels and into diseased tissue.

Uncovering and mitigating algorithmic bias through learned latent structure
Alexander Amini*, Ava P. Soleimany*, Wilko Schwarting, Sangeeta N. Bhatia, Daniela Rus
AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society, 2019, *Co-first authors
MIT press / VentureBeat press

Generalizable algorithm for mitigating hidden biases within training data, by leveraging learned latent distributions to adaptively re-weight the importance of certain data points while training.

Spatial uncertainty sampling for end-to-end control
Alexander Amini, Ava Soleimany, Sertac Karaman, Daniela Rus,
NeurIPS Workshop on Bayesian Deep Learning, 2017

Estimating uncertainty in neural networks for end-to-end control by exploiting feature map correlations during training.

Synthetic recombinase-based state machines in living cells
Nathaniel Roquet, Ava P. Soleimany, Alyssa C. Ferris, Scott Aaronson, Timothy K. Lu
Science, 2016
supplement / press / blog

Programming biological state machines that enable cells to remember and respond to a series of events.

Teaching and Leadership

In addition to research, I am passionate about education and leadership, and strive to help and empower others to excel in their own pursuits.

Introduction to Deep Learning, MIT 6.S191

I am an organizer and lecturer for Introduction to Deep Learning (6.S191), MIT’s official introductory course on deep learning foundations and applications. Together with Alexander Amini, I have organized and developed all aspects of the course, including developing the curriculum, teaching the lectures, creating software labs, and collaborating with industry sponsors. All materials can be found online on the course website.

Co-founder, Momentum AI

I am a co-founder and director for Momentum AI, an outreach program that teaches AI and machine learning to under-resourced and under-served high school students from the greater Boston area. Our two-week capston program is a free, project-based deep dive into AI and is held on MIT's campus.

Teaching Fellow, Harvard MCB294, Fall 2019, with Nancy Kleckner
mit_teaching Teaching Assistant, MIT 7.05, Spring 2016, with Matt Vander Heiden and Mike Yaffe

Teaching Assistant, MIT 7.05, Spring 2015, with Matt Vander Heiden and Mike Yaffe
Select Publicly-Available Talks

page template