Challenge 1: PCR Fragment Design for Gibson Assembly
GitHub repo:
Embed GitHub
What It Does
This script splits a DNA sequence (e.g. a plasmid) into 3 PCR fragments that are:
- 400–800 bp long
- Connected by 20 bp overlaps for Gibson Assembly
It simulates the process of prepping fragments for seamless DNA assembly in the lab.
Why It Matters
Gibson Assembly requires overlapping DNA ends so enzymes can stitch fragments into a continuous sequence.
This tool automatically:
- Chooses valid fragment sizes
- Adds required overlaps
- Verifies the overlaps match exactly
Example Input
We use a synthetic antivenom plasmid (pET-28a backbone + SHRT gene insert) as our DNA input (/data/pET28a_SHRT.fasta in the repo):
📄 pET28a_SHRT.fasta
How to Use
- Run the script:
python pcr_fragment_design/design_gibson_fragments.py- It will:
- Load the FASTA
- Split it into 3 Gibson-ready fragments
- Print fragment coordinates and sequences
- Output:
📄 Gibson_Fragment_Design.csv with fragment metadata
Verified 20 bp overlaps between adjacent fragments
Challenge 2: Primer Design with Melting Temperature
GitHub repo:
Embed GitHub
What It Does
This script finds a pair of PCR primers that:
- Are 18–24 bp long
- Have melting temperatures (Tm) between 55–65°C
- Flank a 500 bp amplicon in a plasmid sequence
It scans the entire DNA sequence and returns the first valid 500 bp region that meets these constraints.
Why It Matters
To amplify DNA by PCR, you need two primers that:
- Bind to opposite ends of your target region
- Are thermodynamically stable (right Tm)
- Point toward each other (forward/reverse)
This script models real-world primer design tools like Primer3, and ensures primers are well-behaved under lab conditions.
Example Input
We use a clean synthetic plasmid (pET28a_Melittin_clean.fasta) with the Melittin bee venom gene inserted into a pET-28a expression vector:
📄 pET28a_Melittin_clean.fasta
This plasmid is designed to express the Melittin peptide in E. coli under a T7 promoter.
How Melting Temperature (Tm) Is Calculated
We use the SantaLucia 1998 Nearest-Neighbor Thermodynamic Model, which calculates Tm using this formula:
Where:
ΔHandΔSare summed over each dinucleotide pair (e.g. AA/TT, GC/CG)Ris the gas constantCis strand concentration
This model is used by Tm_NN() from Biopython:
from Bio.SeqUtils.MeltingTemp import Tm_NN
Tm_NN("ATGCGTACGTAGCTAGCTA")How to Use
- Run the script:
python primer_design.py --fasta pET28a_Melittin_clean.fasta --amplicon_length 500- The script will:
- Slide a 500 bp window along the DNA
- Scan for forward/reverse primers at the window edges
- Return the first pair with valid Tm
- Output:
- Primer sequences
- Positions
- Melting temperatures
- Amplicon boundaries
Challenge 3: Simulating a Robotic Pipetting Protocol
GitHub repo:
Embed GitHub
We are basically diluting samples to reach a specified quantitative value: 20 ng/µL, but total volume ≤ 50 µL. This one was really, really simple. Improvement can be made based on the complexity of input sample data, but the challenge called for a simple CSV.
What It Does
In molecular biology labs, normalizing DNA concentrations across 96 samples is a common task—especially before pooling for sequencing. Robots usually handle this, but they need instructions.
This script calculates how much sample and water to pipette to reach a final concentration of 20 ng/µL, based on measured DNA concentrations per well.
Inputs & Outputs
Input:
- CSV with sample IDs and measured concentrations (e.g.,
45.7 ng/µL)
Output:
- New CSV with pipetting instructions:
- Volume of DNA sample (µL)
- Volume of water (µL)
- Rounded to nearest 0.1 µL
- Max total volume = 50 µL
Why It Matters
This kind of script saves hours in the lab and reduces pipetting errors. It’s trivial for a computer, but error-prone when done manually—especially at scale.
If sample concentration = 40 ng/µL, to reach 20 ng/µL in 50 µL:
Sample volume = 25 µL
Water volume = 25 µLThe script outputs that automatically for all 96 samples. You can imagine this happening, but at scale, with tens of millions of robots serving tens of billions of people.
Next Steps
- Integrate with a real robot's CSV format
- Add error checking for low-concentration edge cases
- Simulate batch pooling for NGS workflows
Final Thoughts
This was a quick one, but even small scripts like this can become part of much larger automation pipelines in biotech. If you're working in a wet lab, automating these tasks is entirely about leverage and predictability.