Molecular Biology

AlphaFold DB: 200M predicted structures

performance requirements:

scripts: 20MB
system dependenceis: 5GB
databases 250GB
model weights: 1GB

Azimuth: annotated reference dataset for single-cell RNA-seq or ATAC-seq experiments

BiGGs models: gold-standard genome-scale models

BindingDB: 3M+ protein-ligand complexes

BioCyc: pathway/genome databases

Biomodels: A database of models that are used in biology

cBioPortal: portal that allows you to explore cancer genomics data visually

ChEMBL: 2.5M compounds

Database Commons: a database of biological databases

Dunbrack backbone-dependent rotamer library: provides side-chain conformations based on backbone dihedral angles, enhancing the accuracy of protein design and modeling

ENCODE: human regulatory DNA

Ensembl: genome data for vertebrates and model organisms

Gene expression omnibus: repository for gene expression datasets

Gene Ontology (GO): directed acyclic graph (DAG) of biological terms

GotEnzymes: database of 25M+ $k_{cat}$ measurements of how fast specific enzymes work

GPCRdb: focused on GPCRs

GTEx: Genotype-Tissue Expression (GTEx) Portal has expression data from 3 NIH projects

GWAS Catalog: database of SNP-trait associations

HGNC (HUGO Gene Nomenclature Committee) – official gene names for human receptors

HuBMAP: modeling the human body at the single cell level

Human Cell Atlas: mapping and annotating every cell in humans

Human Metabolome Database: comprehensive database on human metabolites, biomarkers, quantitative

IEDB (Immune Epitope Database): catalogues experimentally verified immune epitopes (2M of them), which are fragments of antigens recognized by B cells, T cells, and/or MHC molecules

IUPHAR/BPS Guide to Pharmacology – curated list of receptors, ligands, drug targets

KEGG: models molecular interactions, pathways like modeling

Metabolic map: metabolic map of E-Coli and others, eventually will have all humans

ModelArchive: 3,000 predicted structures

OpenHumans: platform for sharing data on many different topics- lots of microbiome, genetics, variants, and viral databases

OpenWetWare.org: experimental protocol information

ORF finder: searches for open reading frames (ORFs) in the DNA sequence you enter

parts.igem.org: standard biological parts, which actually aren’t very standardizes and need to be made engineering-friendly

PathBank: biochemical reactions and interactions within cells

PDB: protein data bank, 238,000 structures

PDBbind+: 34,000 molecular complexes

PHASTER, PHASTEST VIBRANT: phage sequence databases

PhysiCell: virtual laboratory- agent based modeling of cells

Physiome: models organ and tissue function, including circulation, respiration, muscle dynamics (software: https://physiomeproject.org/software)

PubChem: 121M compounds

QM9: 134,000 stable organic molecules with up to 9 heavy atoms (carbon, oxygen, nitrogen, fluorine)

Reactome: molecular interactions of cellular processes, pathways

SAbDab: structural antibody dataset with 9,680 structures

Saccharomyces Genome Database (SGD): 12M base pairs of yeast DNA sequence and the annotation of over 6k genes + thousands of experiments

Sequence Read Archive (SRA): repository of high throughput sequencing data

STRING: protein-protein interaction networks, used to identify Ras-associated human proteins

TCGA (The Cancer Genome Atlas Project): molecularly characterized over 20,000 primary cancer and matched normal samples spanning 33 cancer type

Whole cell model of Myocoplasma genitalium: a whole cell model of a living organism, like the worm

WikiPathways (NetPath): signal transduction pathways in human cells

Uniprot: protein sequence and functional information, receptors

uORFdb: upstream ORFs for genetic editing

VDJdb: database of T-cell receptor (TCR) sequences with known antigen specificities

Virtual Physiological Human: simulates entire-body physiological interactions, up to disease progression

Biotech

Drugs@FDA: FDA approved drugs by month

Enamine REAL Space: 38B drug combinations

Is REAL space limited by the Lipinski rule?

FDA Purple Book: approved biologics and biosimilars

Kegg: drug database

Tools

aRNA amplification: linear amplification of single-cell RNA (what is logarithmic?)

HADDOCK: docking modeling

RUM (RNA-seq Unified Mapper): alignment tool that maps RNA-seq reads to the genome (very old, don’t recommend using)

Oakvar: collection of genome and variant annotation tools