Summary
Problem: Manual tasks in molecular biology, like designing PCR fragments or normalizing sample concentrations, are error-prone, repetitive, and prone to human error at scale.
Approach: We build Python scripts for lab automation:
- Gibson fragment design: Splits plasmid sequences into Gibson-compatible 400–800 bp fragments with 20 bp overlap.
- Primer design: Scans sequences (e.g., Melittin in pET-28a) to yield 18–24 bp primers within a Tₘ window of 55–65 °C using SantaLucia’s nearest-neighbor model (via
Biopython
). - Pipetting simulator: Calculates sample + water volumes to achieve 20 ng/µL for 96-well plates, capped at 50 µL.
Impact: This automates everyday molecular biology preparation: lectures standard protocols into scripts that reduce human error and streamline wet‑lab readiness—especially valuable for high-throughput or automated lab workflows.
Challenge 1: PCR Fragment Design for Gibson Assembly
GitHub repo: Embed GitHub
What It Does
This script splits a DNA sequence (e.g. a plasmid) into 3 PCR fragments that are:
- 400–800 bp long
- Connected by 20 bp overlaps for Gibson Assembly
It simulates the process of prepping fragments for seamless DNA assembly in the lab.
Why It Matters
Gibson Assembly requires overlapping DNA ends so enzymes can stitch fragments into a continuous sequence.
This tool automatically:
- Chooses valid fragment sizes
- Adds required overlaps
- Verifies the overlaps match exactly
Example Input
We use a synthetic antivenom plasmid (pET-28a backbone + SHRT gene insert) as our DNA input (/data/pET28a_SHRT.fasta
in the repo):
📄 pET28a_SHRT.fasta
How to Use
- Run the script:
python pcr_fragment_design/design_gibson_fragments.py
- It will:
- Load the FASTA
- Split it into 3 Gibson-ready fragments
- Print fragment coordinates and sequences
- Output:
📄 Gibson_Fragment_Design.csv
with fragment metadata
Verified 20 bp overlaps between adjacent fragments
Challenge 2: Primer Design with Melting Temperature
GitHub repo: Embed GitHub
What It Does
This script finds a pair of PCR primers that:
- Are 18–24 bp long
- Have melting temperatures (Tm) between 55–65°C
- Flank a 500 bp amplicon in a plasmid sequence
It scans the entire DNA sequence and returns the first valid 500 bp region that meets these constraints.
Why It Matters
To amplify DNA by PCR, you need two primers that:
- Bind to opposite ends of your target region
- Are thermodynamically stable (right Tm)
- Point toward each other (forward/reverse)
This script models real-world primer design tools like Primer3, and ensures primers are well-behaved under lab conditions.
Example Input
We use a clean synthetic plasmid (pET28a_Melittin_clean.fasta
) with the Melittin bee venom gene inserted into a pET-28a expression vector:
📄 pET28a_Melittin_clean.fasta
This plasmid is designed to express the Melittin peptide in E. coli under a T7 promoter.
How Melting Temperature (Tm) Is Calculated
We use the SantaLucia 1998 Nearest-Neighbor Thermodynamic Model, which calculates Tm using this formula:
Where:
ΔH
andΔS
are summed over each dinucleotide pair (e.g. AA/TT, GC/CG)R
is the gas constantC
is strand concentration
This model is used by Tm_NN()
from Biopython:
from Bio.SeqUtils.MeltingTemp import Tm_NN
Tm_NN("ATGCGTACGTAGCTAGCTA")
How to Use
- Run the script:
python primer_design.py --fasta pET28a_Melittin_clean.fasta --amplicon_length 500
- The script will:
- Slide a 500 bp window along the DNA
- Scan for forward/reverse primers at the window edges
- Return the first pair with valid Tm
- Output:
- Primer sequences
- Positions
- Melting temperatures
- Amplicon boundaries
Challenge 3: Simulating a Robotic Pipetting Protocol
GitHub repo: Embed GitHub
We are basically diluting samples to reach a specified quantitative value: 20 ng/µL, but total volume ≤ 50 µL. This one was really, really simple. Improvement can be made based on the complexity of input sample data, but the challenge called for a simple CSV.
What It Does
In molecular biology labs, normalizing DNA concentrations across 96 samples is a common task—especially before pooling for sequencing. Robots usually handle this, but they need instructions.
This script calculates how much sample and water to pipette to reach a final concentration of 20 ng/µL, based on measured DNA concentrations per well.
Inputs & Outputs
Input:
- CSV with sample IDs and measured concentrations (e.g.,
45.7 ng/µL
)
Output:
- New CSV with pipetting instructions:
- Volume of DNA sample (µL)
- Volume of water (µL)
- Rounded to nearest 0.1 µL
- Max total volume = 50 µL
Why It Matters
This kind of script saves hours in the lab and reduces pipetting errors. It’s trivial for a computer, but error-prone when done manually—especially at scale.
If sample concentration = 40 ng/µL
, to reach 20 ng/µL in 50 µL:
Sample volume = 25 µL
Water volume = 25 µL
The script outputs that automatically for all 96 samples. You can imagine this happening, but at scale, with tens of millions of robots serving tens of billions of people.
Next Steps
- Integrate with a real robot's CSV format
- Add error checking for low-concentration edge cases
- Simulate batch pooling for NGS workflows
Final Thoughts
This was a quick one, but even small scripts like this can become part of much larger automation pipelines in biotech. If you're working in a wet lab, automating these tasks is entirely about leverage and predictability.