10/1/25
After doing CPU-based whole-genome analysis in my previous post, I experienced just how bad CPU-based genome analysis is. Even with SIMD, it takes forever. So, I looked for alternatives like Parabricks DeepVariant, DRAGEN, and DNAnexus. Senteion was in the list of software’s to review, but they have extremely optimized CPU software, and I don’t see them being competitive in the future as we scale up GPU/TPU production + the associated neural networks that are optimized for that hardware.
And that’s another thing I saw significantly lacking- there aren’t many neural network’s that are doing genome analysis. The whole-genome sequencing companies like Nebula and Nucleus are going for this- meaning they are creating neural networks that can read DNA and describe variants, using the health market as a means to create this. Ideally we would have inference directly in the consumer.
Even the GPU-based options for variant calling and analysis are just heavily optimized GPU programs, like the CPU ones, but have broken down the embarrassingly parallel sub-computations to be tightly integrated with the specific GPU architecture that the analysis is happening on.
So, in this project, we’re going to check out the GPU-based variant callers + analysis products and run them in a single pipeline, just like we did with the CPU-based one.
Market options for accelerated analysis
There are a lot of options now, but there are mainly 3 listed below. We are going to use Parabricks on out local GPU, but there are cloud versions of all three you can rent from cloud marketplaces. DRAGEN is pretty expensive, and the GPU algorithms are catching up to it in speed.
1. DRAGEN (FPGA)
- FPGA-accelerated
- ~20-30 minutes for full WGS pipeline
- $80,000-150,000 for a chip
- 100-200x faster than CPU
2. NVIDIA Parabricks (GPU)*
- GPU-accelerated (CUDA)
- ~30min-2 hours for full WGS
- NVIDIA GPUs (A100, V100, RTX)
- free if you have your own GPU, or else the cost of running the cloud GPU
- 50-100x faster than CPU
3. DNAnexus (cloud)
- massively parallel cloud compute
- ~1-2 hours (by throwing $ at it)
- pay-per-use, $10-50 per genome
Our pipeline
Our pipeline, for the sake of time and brevity, is going to be the same as the CPU-based one, but with GPU-based software. I wonder how you would rearchitect and design the analysis to be GPU-first…
input: FASTQ files
↓
1. quality control (FastQC/MultiQC)
↓
2. alignment (NVIDIA Parabricks fq2bam) → BAM
↓
3. mark duplicates (NVIDIA Parabricks MarkDupliactes)
↓
4. base quality score recalibration (NVIDIA Parabricks BQSR)
↓
5. variant calling (NVIDIA Parabricks HaplotypeCaller, DeepVariant)
↓
6. output VCF!
↓
7. variant filtering (NVIDIA Parabricks VariantFiltration)
↓
8. variant consequences (SnpEff)
↓
9. variant prioritization (bcftools + awk, QUAL>=30)
↓
10. structural variant detection (cuteSV + CUDA)
↓
11. copy number variant detection (NVIDIA Parabricks GermlineCNVCaller)
↓
output: lots of information about your genome!