Note: a bigger fix to the problems these seek to address could actually be fixed with federated architectures/homomorphic encryption. It’s coming but my god is it slow. It will be interesting to see how regulation adapts to that architecture change.
- HIPAA (S): for data that is considered “protected health information” in clinical/healthcare contexts that can be tied to an individual, requires encryption locally before flight, then end to end encryption
- GINA (US): genetic nondiscrimination, affects how genetic info is handled under HIPAA
- NIH genomic data sharing policies: if you're using NIH‐funded controlled‐access human genomic & phenotypic datasets you need to abide. It has been updated to align with NIST SP 800-171
- GDPR (EU): if any of your data subjects are EU persons, or if data / services crosses the EU. Genetic data is a special category data under GDPR
- Data sovereignty/national laws: data must reside within country, or certain data flows disallowed
- Audit, logging, breach notification laws: there are specifications for log access, what is logged, monitoring, policies for breach, cyber-insurance etc
Cloud implementation
I have an actual implementation in Terraform for a genomic analysis project I did that you can use. It should be compliant across the board (except GDPR because I’m not in Europe).
Else, if you are using Cursor/Codex to scaffold your infrastructure as code, I recommend feeding it the following blurb when scaffolding your infra:
- HIPAA Security Rule: encryption at rest (CMKs), in transit (TLS-deny), access control (least privilege), audit controls (CloudTrail data events + retention/immutability), integrity controls (log validation), person/process controls (SSM, no SSH keys)
- NIH GDS / dbGaP: auditability, least privilege, data segregation, logging of object access, de-identification boundary via buckets/keys, immutable logs
- NIST 800-171/53 controls: key management, boundary protection (endpoints, no public egress), configuration management (Config rules), incident response (logs retained), vulnerability/supply-chain (hardened AMIs, ECR mirror)
- GDPR (if relevant): data minimization (zones), access logs, encryption, regional restriction (add bucket policy conditions on region/org)
Source | What |
“Genomics Data Transfer, Analytics, and Machine Learning using AWS Services, Appendix D: Compliance Resources” | A list of regulations, gray areas, and how AWS helps with things like HIPAA, GDPR, GINA. (AWS Documentation) |
“Architecting for Genomic Data Security and Compliance in AWS” (AWS whitepaper) | Best practices for working with controlled access datasets, setting up access controls, data location, data cleaning, retention, etc. (AWS Static) |
AWS Genomics Guide / Genomics User Guide | Deep dive into security, classification of data, workflows, sharing, analysis. Helps you make design decisions. (AWS Static) |
Navigating HCLS Regulatory and Compliance Requirements on AWS | Health & Life Sciences (HCLS) focus. Talks about audit logs, configuration / change management, DR, etc. (AWS Static) |
AWS GDPR Center | Helps understand AWS’s part under GDPR, tools, data transfer issues, encryption, etc. (Amazon Web Services, Inc.) |
AWS blog: Complying with updated NIH Genomic Data Sharing Policies | Brings you up to date on NIH’s newer rules (starting Jan 2025), how AWS helps satisfy those. (Amazon Web Services, Inc.) |