Neural Networks

Neural Networks in Biology

ABodyBuilder2: predict the structure of antibodies

AlphaFold3: AF2 + ligands, DNA, RNA, and post-translational modifications

ATOMICA: universal representations of intermolecular interactions including protein-small molecule, protein-ion, small molecule-small molecule, protein-protein, protein-peptide, protein-RNA, protein-DNA, and nucleic acid-small molecule complexes

BE-DICT: predicts outcomes of CRISPR base editing

CellOT: neural optimal transport model predicting how cells transition across time, used in development and disease

Chroma: vector DB + retriever, use for integrating + querying protein design history, experimental metadata, model outputs

ClinVar: tells you if a mutation causes a disease or not

Cortex: modular architecture for deep learning systems, a general purpose scoring oracle for antibody optimization pipelines

DBNN: biological molecular circuits in neural form instead of discrete, boolean gates; used for testing before wet lab verification

DeepAccNet: protein model accuracy evaluator

DeepAIR: predicts TCR/BCR-antigen binding affinity and reactivity

DeepRC: Hopfield network with attention that classifies disease

DeepVirFinder: identify viruses based on meta-genomic information

DeepZF: predicts ZF-DNA binding

DiffDock: diffusion model for ligand-protein docking, outperforms traditional scoring in benchmarks

dwJS: walk-jump sampling, which combines score-based + energy-based models for antibody generation, used for exploring novel sequence variants far from known binders

DyAb: uses pair-wise representation to predict differences in protein properties, rather than absolute values; use when prioritizing/generating antibody mutations for property shifts

Enformer: gene expression prediction over long range DNA sequences

ESM models: protein folding models

EvoDiff: generative protein design, $10^{30}$ drug-like molecules possible

Evo2: predicts mutations, function, fitness, structure across the OpenGenome2 dataset

FoldX: predicts how mutations affect a protein’s stability and interactions

GEARS: predicts how gene expression will change when specific genes are edited (knocked out, over-expressed) (Shift Biosciences)

GNINA: deep learning framework for molecular docking

GPN-MSA: 3D models of RNA that predict mutations that affect RNA editing

GraphBAN: predict compound-protein interactions

ESM3: protein foundation model, trained on 500M years of evolution

mosGraphGPT: foundation model trained on multi-omic data to predict cell states and disease outcomes across species (transformer-based attention)

LaMBO-2: multi-objective optimization in sequence space that is categorized, used when optimizing antibodies for analyzing multiple properties at once, like affinity and expression

LBSTER: base model you can use for fine-tuning or embedding antibodies, is learned representations from UniRef50 + antibody space

LigandMPNN: deep learning model that allows explicit modeling of small molecule, nucleotide, metal, and other atomic contexts

LungTCR: lung cancer prediction based on T-cell receptor data

MMSeq2: sequence clustering/deduplication, reducing redundancy in training sets; use when cleaning data before model training

NEHVI: acquisition function that ranks candidates via expected Pareto improvement; use when selecting mathematically unique sequences under uncertainty

PanGenie: Haplotype-aware genotyping using population graphs

PepTune: de novo generation of new therapeutic peptides by gradually improving random sequences that have multiple design goals- like hitting a target, lasting in the body, and being easy to make

ProtGPT2: de novo protein design, autoregressive transformer trained on UniRef50 for protein generation

ProteinMPNN: a sequence design neural network that takes a fixed 3D backbone structure and predicts amino acid sequences likely to fold into it, use it after generating a novel 3D structure, but before experimental expression

PropEn: transforms low-affinity to high-affinity sequences, use to find an average across strong and weak binders

RFDiffusion2: takes in atomic coordinates of scaffolding residues

scGPT: identifies and classifies cells based on input RNA sequences (Shift Biosciences)

scVI: variational auto-encoder for analyzing single-cell RNA-seq, used in single-cell workflows

SeqVDM: variational diffusion model adapted for protein sequences, used for continuous control over diversity and stability of designs

soNNia: NN that predicts sequence-based TCR/BCR binding reactivity

StripedHyena: attention-based models for long sequences; use when modeling long protein sequences/genomes with limited compute

TCRAI: CNN + MIL model that classifies TCR-antigen binding and immune repertoire

Viral Mutation: language model for virus evolution

Tools

AutoDock Vina: ligand-protein docking and screening model (physics-based, not deep learning)

DNAworks: automatic oligonucleotide design for PCR-based gene synthesis

OpenGenome2: the 8.8T genome dataset Arc used to train Evo2

geNomad: database of human genetic variation from diverse populations- 807K genomes, 3+ petabytes raw, 35 TB variant summaries

RifDock: rigid-body docking tool for initial binder-target orientations, assumes proteins don’t flex

Rosetta: protein structure energy scoring (uses Monte Carlo, not NN’s)

Rosetta Fast Relax: structure refinement tool used to iteratively improve designs

SELFIES: generation of molecular graphs which are syntactically and semantically valid

Savanna: pre-training infrastructure for multi-hybrid AI model architectures, like StripedHyena2