What regulations are relevant for cloud-based bioinformatics?

What regulations are relevant for cloud-based bioinformatics?

Note: a bigger fix to the problems these seek to address could actually be fixed with federated architectures/homomorphic encryption. It’s coming but my god is it slow. It will be interesting to see how regulation adapts to that architecture change.

  1. HIPAA (S): for data that is considered “protected health information” in clinical/healthcare contexts that can be tied to an individual, requires encryption locally before flight, then end to end encryption
  2. GINA (US): genetic nondiscrimination, affects how genetic info is handled under HIPAA
  3. NIH genomic data sharing policies: if you're using NIH‐funded controlled‐access human genomic & phenotypic datasets you need to abide. It has been updated to align with NIST SP 800-171
  4. GDPR (EU): if any of your data subjects are EU persons, or if data / services crosses the EU. Genetic data is a special category data under GDPR
  5. Data sovereignty/national laws: data must reside within country, or certain data flows disallowed
  6. Audit, logging, breach notification laws: there are specifications for log access, what is logged, monitoring, policies for breach, cyber-insurance etc

Cloud implementation

I have an actual implementation in Terraform for a genomic analysis project I did that you can use. It should be compliant across the board (except GDPR because I’m not in Europe).

Else, if you are using Cursor/Codex to scaffold your infrastructure as code, I recommend feeding it the following blurb when scaffolding your infra:

  • HIPAA Security Rule: encryption at rest (CMKs), in transit (TLS-deny), access control (least privilege), audit controls (CloudTrail data events + retention/immutability), integrity controls (log validation), person/process controls (SSM, no SSH keys)
  • NIH GDS / dbGaP: auditability, least privilege, data segregation, logging of object access, de-identification boundary via buckets/keys, immutable logs
  • NIST 800-171/53 controls: key management, boundary protection (endpoints, no public egress), configuration management (Config rules), incident response (logs retained), vulnerability/supply-chain (hardened AMIs, ECR mirror)
  • GDPR (if relevant): data minimization (zones), access logs, encryption, regional restriction (add bucket policy conditions on region/org)
Source
What
“Genomics Data Transfer, Analytics, and Machine Learning using AWS Services, Appendix D: Compliance Resources”
A list of regulations, gray areas, and how AWS helps with things like HIPAA, GDPR, GINA. (AWS Documentation)
“Architecting for Genomic Data Security and Compliance in AWS” (AWS whitepaper)
Best practices for working with controlled access datasets, setting up access controls, data location, data cleaning, retention, etc. (AWS Static)
AWS Genomics Guide / Genomics User Guide
Deep dive into security, classification of data, workflows, sharing, analysis. Helps you make design decisions. (AWS Static)
Navigating HCLS Regulatory and Compliance Requirements on AWS
Health & Life Sciences (HCLS) focus. Talks about audit logs, configuration / change management, DR, etc. (AWS Static)
AWS GDPR Center
Helps understand AWS’s part under GDPR, tools, data transfer issues, encryption, etc. (Amazon Web Services, Inc.)
AWS blog: Complying with updated NIH Genomic Data Sharing Policies
Brings you up to date on NIH’s newer rules (starting Jan 2025), how AWS helps satisfy those. (Amazon Web Services, Inc.)