Basic math LEGO bricks that I have used while building at the intersection of computers x biology. I use this as a reference.
Calculus + Linear Algebra
Component | Definition | In Biology / Computing | Math Variables |
Derivative | Instantaneous rate of change | Enzyme kinetics, population growth, gradient descent | |
Partial Derivative | Rate of change w.r.t. one variable | Multi-omics sensitivity, energy landscapes | |
Gradient | Vector of partial derivatives | Backpropagation, optimization | |
Jacobian | Matrix of first-order derivatives | Sensitivity analysis, neural nets | |
Hessian | Matrix of second-order derivatives | Curvature in protein folding, optimization | |
Taylor Expansion | Approximation around a point | Local linearization of pathways, dynamics | |
Chain Rule | Derivative of composite functions | Backpropagation in neural nets | |
Integral | Accumulated quantity | Population size from growth rate, cumulative flux | |
Definite Integral | Area under curve / accumulation over interval | Protein abundance from expression rate | |
Multiple Integral | Integration over multidimensional space | Partition functions, probability densities | |
Divergence | Outflow rate of a vector field | Flux of ions, transport phenomena | |
Curl | Rotation of a vector field | Electromagnetic models in biophysics | |
Laplacian | Divergence of gradient | Diffusion models, electrophysiology | |
Linear Equation | Equation in vector/matrix form | Kinetics, statistical models | |
Matrix Multiplication | Transformation, composition | Feature embeddings, transition systems | |
Determinant | Scalar property of matrix | Volume scaling, invertibility | |
Inverse Matrix | Solves linear systems | Regression, circuit models | |
Eigenvalue / Eigenvector | Invariant scaling directions | PCA, network stability | |
SVD | Decompose into orthogonal bases | Dimensionality reduction (scRNA-seq) | |
QR / LU / Cholesky | Matrix factorizations for solving | Numerical solvers for pathway models | |
Projection | Mapping onto subspace | Embeddings, latent space representations | |
Inner Product | Measure of similarity | Cosine similarity, kernel methods | |
Norm | Length / magnitude of vector | Regularization, error measures | |
Mahalanobis Distance | Distance scaled by covariance | Anomaly detection in cell states | |
Orthogonality | Perpendicular vectors/bases | PCA axes, Fourier modes | |
Rank | Dimension of column/row space | Degrees of freedom in system | |
Trace | Sum of diagonal elements | Invariant measure in statistics | |
Condition Number | Sensitivity of linear system | Numerical stability in simulations |
Probability, Statistics, Information Theory
Component | Definition | In Biology / Computing | Math Variables |
Probability Distribution | Assigns likelihood to outcomes | Gene expression variability, sequencing errors | |
Expectation | Average value under distribution | Mean expression, fitness landscape averages | |
Variance | Spread around the mean | Expression noise, measurement error | |
Covariance | Joint variability of two variables | Co-expression of genes | |
Correlation | Normalized covariance (−1 to 1) | Gene-gene correlation networks | |
Law of Large Numbers | Averages converge to expectation | Replicates reduce noise | |
Central Limit Theorem | Normal distribution emerges from sums | Sequencing depth → Gaussian errors | |
Bernoulli | Single binary outcome | Mutation present/absent | |
Binomial | Sum of Bernoullis | Count of mutations in reads | |
Poisson | Events in fixed interval | RNA counts, sequencing reads | |
Negative Binomial | Overdispersed counts | scRNA-seq models | |
Normal Distribution | Gaussian variability | Expression levels, errors | |
Log-normal | Multiplicative noise | Protein abundance distributions | |
Gamma | Waiting times, skewed distributions | Reaction times, lifetimes | |
Beta | Distribution on [0,1] | Allele frequencies, probabilities | |
Dirichlet | Multivariate generalization of Beta | Topic/cell-type mixtures | |
Multinomial | Multi-class counts | Reads across cell types | |
Maximum Likelihood (MLE) | Best-fit parameters from data | Fit kinetic rates, emission probs | |
Maximum A Posteriori (MAP) | MLE with prior | Regularized estimates | |
Bayesian Inference | Updates belief with data | Model calibration in biology | |
Hypothesis Test | Decision on effect | Gene expression DE tests | |
t-test | Mean difference test | Differential gene expression | |
Chi-square test | Goodness of fit | Contingency tables in genomics | |
FDR / BH Procedure | Controls multiple testing | GWAS, RNA-seq DE genes | |
Bootstrap | Resampling with replacement | CI for small-sample experiments | |
Entropy | Uncertainty of distribution | Cell diversity, motif randomness | |
Cross-Entropy | Divergence between true & model | Loss for classifiers | |
KL Divergence | Relative entropy | Compare distributions (healthy vs disease) | |
Jensen–Shannon Divergence | Symmetrized KL | Compare embeddings | |
Mutual Information | Shared info between vars | Regulatory network inference | |
Perplexity | Exponential of entropy | Quality of embeddings, models | |
Information Bottleneck | Tradeoff compression vs. relevance | Latent representation of cell states |
Optimization and ML Primitives
Component | Definition | In Biology / Computing | Math Variables |
Loss Function | Scalar measure of prediction error | Training models for gene expression, structure prediction | |
Mean Squared Error (MSE) | Average squared error | Regression tasks, kinetics fitting | |
Cross-Entropy Loss | Divergence between distributions | Classification, motif recognition | |
Hinge Loss | Margin-based loss | Support vector machines, binary classification | |
Regularization | Penalty term to avoid overfitting | Control model complexity | |
L1 Regularization | Promotes sparsity | Feature selection in omics | |
L2 Regularization | Penalizes large weights | Ridge regression, weight decay | |
Gradient Descent | Iterative optimization step | Neural nets, ODE parameter fitting | |
Stochastic Gradient Descent (SGD) | Uses minibatches for updates | Large-scale models on bio data | |
Momentum | Uses past updates to accelerate | Faster convergence in training | |
Adam Optimizer | Adaptive moment estimation | Standard in DL training | |
L-BFGS | Quasi-Newton optimization | Energy minimization in proteins | Updates use inverse Hessian approximation |
Conjugate Gradient | Iterative quadratic solver | Sparse system solvers in genomics | |
Coordinate Descent | Optimizes one variable at a time | LASSO, constrained models | |
Lagrangian Multipliers | Optimization with constraints | Flux balance analysis | |
KKT Conditions | Optimality for constrained optimization | Biochemical flux solutions | Stationarity, primal/dual feasibility, complementarity |
Proximal Operator | Handles non-smooth penalties | Sparse regression, TV denoising | |
Expectation-Maximization (EM) | Iterative latent variable inference | Mixture models, cell deconvolution | E-step: , M-step: maximize |
k-Means | Clustering by minimizing within-cluster variance | Cell type clustering | |
Gaussian Mixture Model (GMM) | Soft clustering with Gaussians | Expression distributions | |
Softmax | Converts scores to probabilities | Classification, attention weights | |
ReLU | Nonlinear activation | Neural networks | |
Sigmoid | Squashes to (0,1) | Logistic regression, gating | |
Tanh | Squashes to (-1,1) | Normalized activations | |
Attention Mechanism | Weighted combination of inputs | Protein sequence models, embeddings | |
Contrastive Loss | Brings similar pairs together | Align cell embeddings, protein-ligand | |
VAE ELBO | Lower bound for latent variable models | Generative scRNA-seq | |
Diffusion SDE | Forward corruption process | Generative protein design | |
Graph Laplacian | Encodes connectivity | Protein–protein interaction networks | |
Message Passing | Node embedding updates | GNNs for molecules | |
Equivariance (SE(3)) | Preserves geometric symmetries | Protein structure prediction | |
Clustering Modularity | Graph-based community detection | Gene co-expression networks |
Dynamics (ODEs, PDEs, Stochastic Processes, Control/RL)
Component | Definition | In Biology / Computing | Math Variables |
Ordinary Differential Equation (ODE) | Time evolution of variables | Gene circuits, pharmacokinetics | |
System of ODEs | Multiple interacting variables | Pathways, population dynamics | |
Partial Differential Equation (PDE) | Evolution over space + time | Diffusion of molecules, electrophysiology | |
Heat Equation | Diffusion PDE | Ion transport, morphogen gradients | |
Wave Equation | Propagation dynamics | Nerve impulses, biomechanics | |
Poisson Equation | Potential field equation | Electrostatics in proteins | |
Lotka–Volterra | Predator-prey dynamics | Species competition, host-virus | |
Michaelis–Menten Kinetics | Enzyme rate law | Metabolic modeling | |
Hill Equation | Cooperative binding | Gene regulation, transcription factors | |
Mass-Action Kinetics | Reaction rate ∝ concentrations | Systems biology, stoichiometric models | |
Gillespie Algorithm | Stochastic simulation of reactions | Single-cell variability | Generates trajectories via random exponential waiting times |
Markov Chain (transition + distribution update) | Memoryless state transitions | Mutation models, sequence evolution | |
Hidden Markov Model (HMM) | Latent-state probabilistic model, joint likelihood | Gene finding, protein domains | |
Stochastic Differential Equation (SDE) | Dynamics with noise | Noisy gene expression, molecular motion | |
Ornstein–Uhlenbeck Process | Mean-reverting SDE | Noise in biophysical systems | |
Random Walk | Successive random steps | Diffusion models, genome scans | |
Brownian Motion | Continuous random process | Molecular dynamics | |
Control System | Regulates state of system | Bioreactor control, feedback | |
PID Controller | Proportional-integral-derivative control | Lab robotics, process stabilization | |
Model Predictive Control (MPC) | Optimization-based control | Dynamic bioprocess optimization | |
Markov Decision Process (MDP) | Sequential decision framework | Adaptive experiment design | |
Bellman Equation | Recursive value definition | RL-based model design | |
Policy Gradient | Optimizes expected reward | Reinforcement learning in biology | |
Advantage Estimation (GAE) | Variance-reduced estimator | Efficient policy training | |
PPO (Proximal Policy Optimization) | RL with clipped objective | Safe training in bio RL models | |
SAC (Soft Actor-Critic) | RL with entropy maximization | Exploration in protein design |
Signal Processing, Numerical Methods & HPC
Component | Definition | In Biology / Computing | Math Variables |
Convolution | Weighted overlap of functions | Motif scanning, microscopy filtering | |
Correlation | Similarity via shifted overlap | Template matching in sequences | |
Fourier Transform | Decomposes into frequencies | MRI k-space, EEG | |
Discrete Fourier Transform (DFT) | Finite-sample version | Sequence periodicity detection | |
Fast Fourier Transform (FFT) | Efficient DFT algorithm | Bio-signal analysis at scale | complexity |
Wavelet Transform | Time–frequency decomposition | Microscopy image denoising | |
Radon Transform | Line integrals of function | CT reconstruction | |
Filter (Low/High-pass) | Signal smoothing or sharpening | Noise reduction in time series | Frequency cutoff: |
Wiener Filter | Linear MMSE estimator | Signal denoising | |
Kalman Filter | Recursive state estimator | Tracking cell motion | |
Particle Filter | Sequential Monte Carlo | Nonlinear/noisy tracking | Approximate posterior via particles |
Total Variation (TV) Denoising | Penalizes gradient magnitude | Microscopy deblurring | |
Compressed Sensing | Recovery from undersampling | Accelerated MRI | |
Finite Difference | Approx derivative by discretization | PDE solvers | |
Finite Element Method (FEM) | Domain discretization into elements | Biomechanics, electrophysiology | Weak form: |
Finite Volume Method | Conserves fluxes per cell | Transport models in tissues | |
Monte Carlo Integration | Random sampling for integrals | Partition functions, uncertainty | |
Importance Sampling | Weighted MC estimates | Rare-event modeling | |
Quasi-Monte Carlo | Low-discrepancy sequences | Faster convergence for high-dim integrals | Uses Sobol / Halton sequences |
Automatic Differentiation | Programmatic derivative | Training ML models | Forward & reverse mode |
Backpropagation | Reverse-mode autodiff | Neural nets in bio | via chain rule |
Roofline Model | Performance vs. arithmetic intensity | Kernel optimization | FLOPs/byte tradeoff |
Amdahl’s Law | Parallelism speedup bound | Multi-core scaling limits | |
Gustafson’s Law | Scaling efficiency with workload | HPC bio pipelines | |
Memory Bandwidth Limit | Bytes/sec bottleneck | GPU genomics kernels | Throughput = min(compute, memory BW) |
SIMD / GPU Parallelism | Single-instruction, many data | K-mer counting, alignment | Vector ops per cycle |
Sparse Matrix Ops | Efficient storage/computation | Genome graphs, scRNA matrices | Formats: CSR, COO |
Bioinformatics and Sequence Mathematics
Component | Definition | In Biology / Computing | Math Variables |
k-mer | Substring of length kk | Genome comparison, sequence hashing | |
Jaccard Index | Set similarity measure | Genome sketching, assembly comparison | |
MinHash | Fast Jaccard approximation | Large-scale sequence similarity | Randomized hashing of k-mers |
Count-Min Sketch | Probabilistic frequency table | Streaming k-mer counts | |
Hamming Distance | Number of differing positions | DNA barcode error correction | |
Levenshtein Distance | Edit distance (insert/del/sub) | Sequence alignment | Minimum edits to transform x→yx \to y |
Smith–Waterman | Local alignment DP | Short sequence homology | |
Needleman–Wunsch | Global alignment DP | Genome alignment | |
Gotoh Algorithm | Alignment with affine gaps | Realistic indel scoring | Gap cost |
Substitution Matrix | Scoring amino acid swaps | BLOSUM, PAM | |
Position Weight Matrix (PWM) | Motif probability matrix | TF binding site prediction | P, where is base |
Hidden Markov Model (Profile HMM) | Motif/sequence family model | Protein domains | Transition + emission probabilities |
Suffix Array | Sorted suffix positions | Fast substring search | = sorted indices of suffixes |
Suffix Tree | Tree of suffixes | Genome indexing | Nodes = substrings |
Burrows–Wheeler Transform (BWT) | Reversible string transform | Basis of read mappers | Last column of sorted rotations |
FM-Index | Compressed substring index | Read mapping | Supports pattern matching |
De Bruijn Graph | k-mer graph structure | Genome assembly | Nodes = k-1-mers, edges = k-mers |
Eulerian Path | Traverses each edge once | Assembly from k-mers | Exists if in-degree = out-degree |
Hamiltonian Path | Traverses each node once | Overlap-layout assembly | NP-hard |
Phred Score | Log-scaled error probability | Sequencing quality | |
Codon Usage Bias | Frequency of synonymous codons | Expression optimization | CAI, tAI formulas |
Codon Adaptation Index (CAI) | Expression potential metric | Gene design | |
tRNA Adaptation Index (tAI) | Translation efficiency score | Synthetic biology | Based on tRNA availability |
GC Content | Fraction of G+C bases | Genomic stability | |
k-mer Spectrum | Histogram of k-mer counts | Detect heterozygosity, repeats | distribution |
Sequence Entropy | Information content of sequence | Motif conservation | |
Motif Scanning (Convolution) | PWM convolution across genome | Regulatory site finding | |
BLAST Scoring | Heuristic local alignment | Homology search | Uses seed-and-extend + substitution matrices |
Phylogenetic Tree | Evolutionary tree model | Ancestral inference | Distance or likelihood based |
Felsenstein Pruning Algorithm | Likelihood computation on tree | Phylogenetic likelihoods | Dynamic programming over nodes |
Systems Biology, Structural Biology & Population Genetics
Component | Definition | In Biology / Computing | Math Variables |
Stoichiometric Matrix (S) | Encodes reaction network | Flux balance analysis (FBA) | |
Flux Balance Analysis (FBA) | Linear optimization on S | Metabolic pathway prediction | |
Flux Variability Analysis (FVA) | Range of feasible fluxes | Robustness of metabolism | Optimizes min/max viv_i under FBA constraints |
Parsimonious FBA (pFBA) | Minimizes total flux | Efficient metabolic solutions | |
Metabolic Control Analysis | Quantifies control coefficients | Sensitivity in pathways | |
Michaelis–Menten | Enzyme kinetics | Reaction velocity | |
Hill Equation | Cooperative binding | TF–DNA regulation | |
Mass-Action Law | Rate ∝ reactant concentrations | Reaction network modeling | |
Arrhenius Equation | Temp dependence of rate | Biochemical kinetics | |
Eyring Equation | Transition state theory | Reaction thermodynamics | |
Gibbs Free Energy | ΔG predicts spontaneity | Protein folding, binding | |
Binding Equilibrium | Ligand–receptor affinity | Protein–drug interactions | |
ΔG–Kd Relation | Thermodynamic link | Quantifying binding strength | |
Force Fields | Energy functions in MD | Protein simulations | |
Lennard–Jones Potential | van der Waals model | Molecular packing | |
Coulomb’s Law | Electrostatic interactions | Charged biomolecules | |
Ewald/PME Summation | Long-range electrostatics | Protein MD | Splits short/long-range terms |
Root Mean Square Deviation (RMSD) | Structure difference metric | Protein structure evaluation | |
TM-score | Protein structural similarity | Structure prediction accuracy | |
Contact Map | Binary residue contacts | Folding, docking models | |
Ramachandran Plot | φ–ψ torsional space | Protein conformational analysis | Allowed regions of |
Rotamer Library | Discrete side-chain conformations | Protein modeling | Probabilities over torsion states |
Free Energy Perturbation (FEP) | ΔΔG between states | Binding affinity prediction | |
Thermodynamic Integration (TI) | Computes ΔG via λ interpolation | Drug design | |
MBAR | Multi-state free-energy estimator | Protein/ligand ΔΔG | Weighted combination of samples |
Hardy–Weinberg Equilibrium | Allele frequency model | Population genetics | |
Wright–Fisher Model | Genetic drift in finite pops | Allele frequency variance | Binomial sampling of alleles each gen |
Moran Model | Overlapping-gen population model | Drift, fixation | One birth + one death per step |
Coalescent Theory | Backward-time genealogy | Ancestral allele inference | Distribution of coalescent times |
Fixation Probability | Probability allele becomes fixed | Selection vs drift | |
Substitution Models | Models nucleotide changes | Phylogenetics | JC69, HKY, GTR matrices |
Felsenstein Pruning | Likelihood on trees | Phylogenetic inference | Recursive likelihood computation |
Evaluation, Scaling & Cryptography
(includes some quantum)
Component | Definition | In Biology / Computing | Math Variables |
Accuracy | Fraction of correct predictions | Classifier evaluation | |
Precision | Correct positives / all positives | Gene variant calling | |
Recall (Sensitivity) | True positives / actual positives | Rare mutation detection | |
Specificity | True negatives / actual negatives | Diagnostic screening | |
F1 Score | Harmonic mean of precision & recall | Balancing bio classifier performance | |
ROC Curve / AUC | Tradeoff sensitivity vs specificity | Diagnostic classifiers | = area under curve |
PR Curve / AUC | Precision–recall tradeoff | Imbalanced omics data | Area under PR curve |
Matthews Correlation (MCC) | Balanced measure even with imbalance | DNA classification | |
Brier Score | Calibrated probability error | Probabilistic predictions | |
Calibration (Temp Scaling) | Adjusts softmax confidence | Protein function prediction | |
Conformal Prediction | Distribution-free prediction intervals | Genomic risk scores | |
Aleatoric Uncertainty | Intrinsic randomness | Sequencing noise | Modeled in likelihood variance |
Epistemic Uncertainty | Model ignorance | Limited training data | Ensembles, Bayesian NNs |
Learning Curve | Error vs dataset size | Scaling genomic models | |
Power Law (Scaling Law) | Performance vs compute/data | Deep learning in bio | |
Chinchilla Law | Optimal compute–data balance | Training large bio models | Loss scales with tokens ∝ |
Wright’s Law | Cost falls with production | Sequencing costs | |
Queueing Model (Little’s Law) | Throughput relation | Bio pipeline scheduling | |
Sensitivity Analysis | Effect of parameter variation | Bioprocess robustness | |
Cryptographic Hash | One-way function | Genomic data integrity | |
Homomorphic Encryption (HE) | Compute on ciphertexts | Privacy-preserving genomics | |
Lattice-based Crypto (RLWE) | Hard lattice problem | Secure bio models | |
CKKS Scheme | Approximate HE for reals | Encrypted ML inference | Supports , on ciphertexts |
Noise Budget | Error growth in HE ops | Bio AI on encrypted data | Ciphertext validity bound |
Quantum Operator Algebra | Linear operators on Hilbert space | Quantum chemistry models | ( \hat H |
Spectral Decomposition | Expanding in eigenbasis | Quantum Hamiltonians, protein folding | |
Tensor Product | Composite quantum states | Multi-particle biology | |
Density Matrix | Mixed state representation | Open system biology | |
von Neumann Entropy | Entropy of quantum state | Quantum biology analogs |