Background

Gene therapy is expensive. But boy when it works, it works.

Therapy	Approximate List Price (U.S.)	Patients Treated (Public Estimate)	Estimated Eligible Pool (U.S.)
Casgevy	~$2.2M (CRISPR-based sickle cell therapy) (European Society of Cardiology, Wikipedia)	~100 globally (early rollout) (Reuters)	~100,000 sickle cell patients in U.S. (U.S. Food and Drug Administration, Reuters)
Lyfgenia	~$3.1M (cell-based, sickle cell) (European Society of Cardiology)	Not yet public; likely single-digits tens	Same broader pool
Roctavian	~$2.9M (hemophilia A) (European Society of Cardiology, Wikipedia)	Not published	~20,000 U.S. with severe hemophilia A
Hemgenix	~$3.5M (hemophilia B) (Fierce Pharma, Investors)	Not disclosed	~4,000–5,000 U.S. severe cases
Beqvez	~$3.5M (hemophilia B; Pfizer) (Investors)	Not yet public; anticipatory	Same as Hemgenix targets
Zynteglo	~$2.8M (beta thalassemia) (Wikipedia)	Limited (withdrawn EU due to poor uptake)	~5,000–10,000 dependent patients globally
Lenmeldy	~$4.25M (metachromatic leukodystrophy) (Financial Times)	~37 in pivotal trial; revenue ~$12.7M in 9mo (Financial Times)	~40 U.S. births/year (ultra-rare)
Vyjuvek	~$630K/year (wound gene therapy, EB) (European Society of Cardiology, Wikipedia)	Not public—novel topical product	Rare dystrophic EB patients (low hundreds)
Zevaskyn	~$1.75M per round (RDEB) (Reuters)	Not yet administered; planned availability Q3 2025 (Reuters)	Same target population as EB (rare)
Elevidys	~$3.2M (Duchenne muscular dystrophy) (European Society of Cardiology)	Not public yet	~300–600 U.S. boys (ambulatory DMD age)

We want to help drop these insane costs. The scAAVengr pipeline is an AAV delivery pipeline that attempts to do that, following these steps:

	Step	Description	% wet lab	% computational
1	build barcoded AAV library	make a set of AAV capsid variants, each tagged with a short, unique barcode	100%	0%
2	package reporter + barcode into AAV’s	load each vector with the same reporter gene (GFP) and its unique barcode	100%	0%
3	deliver mix to tissue	inject the pooled AAV library into the target tissue (retina, brain)	100%	0%
4	let expression occur	give time for AAV’s to infect cells and express reporter	100%	0%
5	single-cell RNA sequencing	capture RNA from individual cells, sequencing both cell barcodes and AAV barcodes	85%	15%
6	map AAV barcodes → cell type	match which capsid variants successfully delivered the reporter into which cell types	0%	100%
7	quantify efficiency	count and compare how many cells of each type each variant infected, normalizing for sequencing depth	0%	100%

My goal is to convert all wet lab steps from scAAVengr to be 100% computational without losing information. Of course, this is infeasible due to the sheer density of biological information, but it is bounded by the current quantitative measurements that determine a successful wet lab experiment.

If we can match the evals that the wet lab uses to determine a successful outcome, but inside the computer, we can then work on expanding the simulations to be as accurate as the real world, making the amount of experiments that we can run that much cheaper and faster. This will then drive innovation around evals for even more accurate experimentation, which will still require wet lab work for QA.

If we can turn the AAV-accuracy model into a data acquisition * model architecture problem, this turns into a scale problem. And if the problem is scale, then a business model + engineered fulfillment system can have a shot at opening this frontier for awesome innovation.

So how do we do make each wet lab step computational?

Architecture

We start with the scAAVengr pipeline as a guideline (the numbers map to the step numbers in the table above).

AAV library generation:

what: clone capsid genes + barcodes in plasmids (wet)
computer equivalent: capsid sequence design + barcode embedding entirely in-silico (DNA string operations, randomization, constraint satisfaction)
wet lab QA: Sanger check of a single plasmid

Vector packaging:

what: HEK293 cell AAV production (wet)
computer equivalent: protein folding + capsid assembly simulation (via molecular dynamics simulators like GROMACS/AMBER) + efficiency prediction (ML models trained on prior yields)
wet lab QA: sample to validate predicted titer

Delivery to tissue:

what: animal injection (wet lab)
multi-scale biodistribution model (PBPK + tissue microenvironment, agent-based models most likely) to simulate tropism, dosing, and clearance, at the custom cellular level that depends on the specific species and genome (simulate from a starter genome + cell culture from that animal)
wet lab QA: imaging injections to verify predicted spread in the cell cultures

Expression:

what: wait for transgene expression in the cells
computer equivalent: transcript kinetics model (ODE-based transcription + translation simulation) informed by promoter strength and epigenetic accessibility data

Single-cell RNA-seq: disassociate cells, capture with 10x, sequence (wet lab)

computer equivalent: generate synthetic scRNA-seq datasets using models fit to known cell-type transcriptomes + AAV barcode mapping
wet lab QA: sequence one sample to check distribution match

Barcode → cell-type mapping: already computational, includes alignment, demultiplexing, mapping
Efficiency scoring: already computational, includes statistical models to rank capsid x cell type performance

Implementation

[WIP- you can find the current state in the Github Repo. I am updating this. I have taken a step back to get better at implementing architectures- will return in ~2 weeks (8/31/25).]

Conclusion

Moving more of biology into digital will push the demand for wet lab work to be much, much higher; where compute cost, number of experiments, and wet lab demand will be highly intertwined. So the largest constraint to an exponential increase in biological products is tied to the decrease in input costs, aka, move as much of biology as physically possible into computers. Which then depends on chip prices, new hardware, computer science architectures on top of those, abstractions for biology, regulation inefficiency, capital cost, and energy costs.

Gene therapy pipeline (in silico)

Background

Architecture

Implementation

Conclusion