Molecular Biology

Molecular Biology

Molecular Biology

Azimuth: annotated reference dataset for single-cell RNA-seq or ATAC-seq experiments

BiGGs models: gold-standard genome-scale models

BioCyc: pathway/genome databases

Biomodels: A database of models that are used in biology

cBioPortal: portal that allows you to explore cancer genomics data visually

Database Commons: a database of biological databases

Dunbrack backbone-dependent rotamer library: provides side-chain conformations based on backbone dihedral angles, enhancing the accuracy of protein design and modeling

ENCODE: human regulatory DNA

Ensembl: genome data for vertebrates and model organisms

Gene expression omnibus: repository for gene expression datasets

Gene Ontology (GO): directed acyclic graph (DAG) of biological terms

GotEnzymes: database of 25M+ kcatk_{cat} measurements of how fast specific enzymes work

GPCRdb: focused on GPCRs

GTEx: Genotype-Tissue Expression (GTEx) Portal has expression data from 3 NIH projects

GWAS Catalog: database of SNP-trait associations

HGNC (HUGO Gene Nomenclature Committee) – official gene names for human receptors

HuBMAP: modeling the human body at the single cell level

Human Cell Atlas: mapping and annotating every cell in humans

Human Metabolome Database: comprehensive database on human metabolites, biomarkers, quantitative

IEDB (Immune Epitope Database): catalogues experimentally verified immune epitopes (2M of them), which are fragments of antigens recognized by B cells, T cells, and/or MHC molecules

IUPHAR/BPS Guide to Pharmacology – curated list of receptors, ligands, drug targets

KEGG: models molecular interactions, pathways like modeling

Metabolic map: metabolic map of E-Coli and others, eventually will have all humans

OpenHumans: platform for sharing data on many different topics- lots of microbiome, genetics, variants, and viral databases

OpenWetWare.org: experimental protocol information

ORF finder: searches for open reading frames (ORFs) in the DNA sequence you enter

parts.igem.org: standard biological parts, which actually aren’t very standardizes and need to be made engineering-friendly

PathBank: biochemical reactions and interactions within cells

PDB: protein data bank

PHASTER, PHASTEST VIBRANT: phage sequence databases

PhysiCell: virtual laboratory- agent based modeling of cells

Physiome: models organ and tissue function, including circulation, respiration, muscle dynamics (software: https://physiomeproject.org/software)

Reactome: molecular interactions of cellular processes, pathways

Saccharomyces Genome Database (SGD): 12M base pairs of yeast DNA sequence and the annotation of over 6k genes + thousands of experiments

Sequence Read Archive (SRA): repository of high throughput sequencing data

STRING: protein-protein interaction networks, used to identify Ras-associated human proteins

TCGA (The Cancer Genome Atlas Project): molecularly characterized over 20,000 primary cancer and matched normal samples spanning 33 cancer type

Whole cell model of Myocoplasma genitalium: a whole cell model of a living organism, like the worm

WikiPathways (NetPath): signal transduction pathways in human cells

Uniprot: protein sequence and functional information, receptors

uORFdb: upstream ORFs for genetic editing

VDJdb: database of T-cell receptor (TCR) sequences with known antigen specificities

Virtual Physiological Human: simulates entire-body physiological interactions, up to disease progression

Biotech

Drugs@FDA: FDA approved drugs by month

Enamine REAL Space: 38B drug combinations

  • Is REAL space limited by the Lipinski rule?

FDA Purple Book: approved biologics and biosimilars

Kegg: drug database

Tools

aRNA amplification: linear amplification of single-cell RNA (what is logarithmic?)

RUM (RNA-seq Unified Mapper): alignment tool that maps RNA-seq reads to the genome (very old, don’t recommend using)

Oakvar: collection of genome and variant annotation tools

TODO:

HTSeq + htseq-count: quantification tool that assigns reads to genes

like a neural network?

Outlier-sum statistic: statistical test that detects extreme variation in single-cell expression

what limit for them?

F-statistic (expression noise): variation metric that quantifies biological vs technical variability

DAVID / MGI / Amigo: functional annotation that compares gene lists across species

Jaccard index: similarity metric that compares gene lists across species

FISSEQ: sequencing technology that has fluorescent in situ sequencing of RNA directly inside cells and tissues

principle of in situ: “in its original place”

Rolling Circle Amplification (RCA): molecular biology method that amplifies circularized cDNA in situ to form DNA nanoballs

Partition sequencing: optical trick that controls density of signal by using partially matched sequencing primers

SOLiD Sequencing by Ligation: NGS chemistry that uses fluorescence-based base-calling inside fixed samples

BS(PEG)9: cross-linking reagent that anchors amplified cDNA in place to preserve spatial information

Deconvolved microscopy: imaging method that improves resolution of in situ sequencing signal

Add for binding assay data (data from validated experiments):

  1. ChEMBL - Most comprehensive
    • 2M+ compounds, 13K+ targets, 2M+ assays
    • Free API access
    • Focus: Drug discovery, medicinal chemistry
  2. BindingDB - Specialized binding data
    • 2M+ binding measurements
    • Free web interface
    • Focus: Protein-ligand binding affinities
  3. PubChem BioAssay - NIH database
    • 1M+ bioassays, 300K+ compounds
    • Free access
    • Focus: High-throughput screening data

Specialized Databases:

  1. PDB (Protein Data Bank) - Structural data
    • 200K+ protein structures
    • Binding site information
    • Focus: 3D structures and binding sites
  2. UniProt - Protein annotations
    • Experimental evidence for protein function
    • Binding partner information
    • Focus: Protein function and interactions
  3. IntAct - Molecular interactions
    • Protein-protein interactions
    • Experimental interaction data
    • Focus: Protein interaction networks