Neural Networks in Biology
ABodyBuilder2: predict the structure of antibodies
AlphaFold3: AF2 + ligands, DNA, RNA, and post-translational modifications
ATOMICA: universal representations of intermolecular interactions including protein-small molecule, protein-ion, small molecule-small molecule, protein-protein, protein-peptide, protein-RNA, protein-DNA, and nucleic acid-small molecule complexes
BE-DICT: predicts outcomes of CRISPR base editing
CellOT: neural optimal transport model predicting how cells transition across time, used in development and disease
Chroma: vector DB + retriever, use for integrating + querying protein design history, experimental metadata, model outputs
ClinVar: tells you if a mutation causes a disease or not
Cortex: modular architecture for deep learning systems, a general purpose scoring oracle for antibody optimization pipelines
DBNN: biological molecular circuits in neural form instead of discrete, boolean gates; used for testing before wet lab verification
DeepAccNet: protein model accuracy evaluator
DeepAIR: predicts TCR/BCR-antigen binding affinity and reactivity
DeepRC: Hopfield network with attention that classifies disease
DeepVirFinder: identify viruses based on meta-genomic information
DeepZF: predicts ZF-DNA binding
DiffDock: diffusion model for ligand-protein docking, outperforms traditional scoring in benchmarks
dwJS: walk-jump sampling, which combines score-based + energy-based models for antibody generation, used for exploring novel sequence variants far from known binders
DyAb: uses pair-wise representation to predict differences in protein properties, rather than absolute values; use when prioritizing/generating antibody mutations for property shifts
Enformer: gene expression prediction over long range DNA sequences
ESM models: protein folding models
EvoDiff: generative protein design, drug-like molecules possible
Evo2: predicts mutations, function, fitness, structure across the OpenGenome2 dataset
FoldX: predicts how mutations affect a proteinās stability and interactions
GEARS: predicts how gene expression will change when specific genes are edited (knocked out, over-expressed) (Shift Biosciences)
GNINA: deep learning framework for molecular docking
GPN-MSA: 3D models of RNA that predict mutations that affect RNA editing
GraphBAN: predict compound-protein interactions
ESM3: protein foundation model, trained on 500M years of evolution
mosGraphGPT: foundation model trained on multi-omic data to predict cell states and disease outcomes across species (transformer-based attention)
LaMBO-2: multi-objective optimization in sequence space that is categorized, used when optimizing antibodies for analyzing multiple properties at once, like affinity and expression
LBSTER: base model you can use for fine-tuning or embedding antibodies, is learned representations from UniRef50 + antibody space
LigandMPNN: deep learning model that allows explicit modeling of small molecule, nucleotide, metal, and other atomic contexts
LungTCR: lung cancer prediction based on T-cell receptor data
MMSeq2: sequence clustering/deduplication, reducing redundancy in training sets; use when cleaning data before model training
NEHVI: acquisition function that ranks candidates via expected Pareto improvement; use when selecting mathematically unique sequences under uncertainty
PanGenie: Haplotype-aware genotyping using population graphs
PepTune: de novo generation of new therapeutic peptides by gradually improving random sequences that have multiple design goals- like hitting a target, lasting in the body, and being easy to make
ProtGPT2: de novo protein design, autoregressive transformer trained on UniRef50 for protein generation
ProteinMPNN: a sequence design neural network that takes a fixed 3D backbone structure and predicts amino acid sequences likely to fold into it, use it after generating a novel 3D structure, but before experimental expression
PropEn: transforms low-affinity to high-affinity sequences, use to find an average across strong and weak binders
RFDiffusion2: takes in atomic coordinates of scaffolding residues
scGPT: identifies and classifies cells based on input RNA sequences (Shift Biosciences)
scVI: variational auto-encoder for analyzing single-cell RNA-seq, used in single-cell workflows
SeqVDM: variational diffusion model adapted for protein sequences, used for continuous control over diversity and stability of designs
soNNia: NN that predicts sequence-based TCR/BCR binding reactivity
StripedHyena: attention-based models for long sequences; use when modeling long protein sequences/genomes with limited compute
TCRAI: CNN + MIL model that classifies TCR-antigen binding and immune repertoire
Viral Mutation: language model for virus evolution
Tools
AutoDock Vina: ligand-protein docking and screening model (physics-based, not deep learning)
DNAworks: automatic oligonucleotide design for PCR-based gene synthesis
OpenGenome2: the 8.8T genome dataset Arc used to train Evo2
geNomad: database of human genetic variation from diverse populations- 807K genomes, 3+ petabytes raw, 35 TB variant summaries
RifDock: rigid-body docking tool for initial binder-target orientations, assumes proteins donāt flex
Rosetta: protein structure energy scoring (uses Monte Carlo, not NNās)
Rosetta Fast Relax: structure refinement tool used to iteratively improve designs
SELFIES: generation of molecular graphs which are syntactically and semantically valid
Savanna: pre-training infrastructure for multi-hybrid AI model architectures, like StripedHyena2