Background
Gene therapy is expensive. But boy when it works, it works.
Therapy | Approximate List Price (U.S.) | Patients Treated (Public Estimate) | Estimated Eligible Pool (U.S.) |
Casgevy | ~$2.2M (CRISPR-based sickle cell therapy) (European Society of Cardiology, Wikipedia) | ~100 globally (early rollout) (Reuters) | ~100,000 sickle cell patients in U.S. (U.S. Food and Drug Administration, Reuters) |
Lyfgenia | ~$3.1M (cell-based, sickle cell) (European Society of Cardiology) | Not yet public; likely single-digits tens | Same broader pool |
Roctavian | ~$2.9M (hemophilia A) (European Society of Cardiology, Wikipedia) | Not published | ~20,000 U.S. with severe hemophilia A |
Hemgenix | ~$3.5M (hemophilia B) (Fierce Pharma, Investors) | Not disclosed | ~4,000–5,000 U.S. severe cases |
Beqvez | ~$3.5M (hemophilia B; Pfizer) (Investors) | Not yet public; anticipatory | Same as Hemgenix targets |
Zynteglo | ~$2.8M (beta thalassemia) (Wikipedia) | Limited (withdrawn EU due to poor uptake) | ~5,000–10,000 dependent patients globally |
Lenmeldy | ~$4.25M (metachromatic leukodystrophy) (Financial Times) | ~37 in pivotal trial; revenue ~$12.7M in 9mo (Financial Times) | ~40 U.S. births/year (ultra-rare) |
Vyjuvek | ~$630K/year (wound gene therapy, EB) (European Society of Cardiology, Wikipedia) | Not public—novel topical product | Rare dystrophic EB patients (low hundreds) |
Zevaskyn | ~$1.75M per round (RDEB) (Reuters) | Not yet administered; planned availability Q3 2025 (Reuters) | Same target population as EB (rare) |
Elevidys | ~$3.2M (Duchenne muscular dystrophy) (European Society of Cardiology) | Not public yet | ~300–600 U.S. boys (ambulatory DMD age) |
We want to help drop these insane costs. The scAAVengr pipeline is an AAV delivery pipeline that attempts to do that, following these steps:
Step | Description | % wet lab | % computational | |
1 | build barcoded AAV library | make a set of AAV capsid variants, each tagged with a short, unique barcode | 100% | 0% |
2 | package reporter + barcode into AAV’s | load each vector with the same reporter gene (GFP) and its unique barcode | 100% | 0% |
3 | deliver mix to tissue | inject the pooled AAV library into the target tissue (retina, brain) | 100% | 0% |
4 | let expression occur | give time for AAV’s to infect cells and express reporter | 100% | 0% |
5 | single-cell RNA sequencing | capture RNA from individual cells, sequencing both cell barcodes and AAV barcodes | 85% | 15% |
6 | map AAV barcodes → cell type | match which capsid variants successfully delivered the reporter into which cell types | 0% | 100% |
7 | quantify efficiency | count and compare how many cells of each type each variant infected, normalizing for sequencing depth | 0% | 100% |
My goal is to convert all wet lab steps from scAAVengr to be 100% computational without losing information. Of course, this is infeasible due to the sheer density of biological information, but it is bounded by the current quantitative measurements that determine a successful wet lab experiment.
If we can match the evals that the wet lab uses to determine a successful outcome, but inside the computer, we can then work on expanding the simulations to be as accurate as the real world, making the amount of experiments that we can run that much cheaper and faster. This will then drive innovation around evals for even more accurate experimentation, which will still require wet lab work for QA.
If we can turn the AAV-accuracy model into a data acquisition * model architecture problem, this turns into a scale problem. And if the problem is scale, then a business model + engineered fulfillment system can have a shot at opening this frontier for awesome innovation.
So how do we do make each wet lab step computational?
Architecture
We start with the scAAVengr pipeline as a guideline (the numbers map to the step numbers in the table above).
- AAV library generation:
- what: clone capsid genes + barcodes in plasmids (wet)
- computer equivalent: capsid sequence design + barcode embedding entirely in-silico (DNA string operations, randomization, constraint satisfaction)
- wet lab QA: Sanger check of a single plasmid
- Vector packaging:
- what: HEK293 cell AAV production (wet)
- computer equivalent: protein folding + capsid assembly simulation (via molecular dynamics simulators like GROMACS/AMBER) + efficiency prediction (ML models trained on prior yields)
- wet lab QA: sample to validate predicted titer
- Delivery to tissue:
- what: animal injection (wet lab)
- multi-scale biodistribution model (PBPK + tissue microenvironment, agent-based models most likely) to simulate tropism, dosing, and clearance, at the custom cellular level that depends on the specific species and genome (simulate from a starter genome + cell culture from that animal)
- wet lab QA: imaging injections to verify predicted spread in the cell cultures
- Expression:
- what: wait for transgene expression in the cells
- computer equivalent: transcript kinetics model (ODE-based transcription + translation simulation) informed by promoter strength and epigenetic accessibility data
- Single-cell RNA-seq: disassociate cells, capture with 10x, sequence (wet lab)
- computer equivalent: generate synthetic scRNA-seq datasets using models fit to known cell-type transcriptomes + AAV barcode mapping
- wet lab QA: sequence one sample to check distribution match
- Barcode → cell-type mapping: already computational, includes alignment, demultiplexing, mapping
- Efficiency scoring: already computational, includes statistical models to rank capsid x cell type performance
Implementation
[WIP- you can find the current state in the Github Repo. I am updating this daily.]
Conclusion
Moving more of biology into digital will push the demand for wet lab work to be much, much higher; where cost, number of experiments, and wet lab demand will be highly intertwined. So the largest constraint to an exponential increase in biological products is tied to the decrease in input costs, aka, move as much of biology as physically possible into computers.