Regulations for biotech agents

Regulations for biotech agents

Note: a bigger fix to the problems these seek to address could actually be fixed with federated architectures/homomorphic encryption. It’s coming but my god is it slow. It will be interesting to see how regulation adapts to that architecture change.

  1. HIPAA: applies to data that is considered “protected health information” in clinical/healthcare contexts that can be tied to an individual, requires encryption locally before flight, then end to end encryption; restrict access to PHI, log disclosures, and sign Business Associate Agreements. Changes to data must also be logged, and models that are trained on this data need traceability, an access history store, and reproducibility of the model version for expected inputs → outputs
  2. GINA (US): covers genetic nondiscrimination and biases, which affects how genetic info is handled under HIPAA. You want to ensure your agents don’t use this to bias responses (I believe this regulation will need tuning as agents start automatically doing computations based on our genomes and other biological data)
  3. GDPR (EU): if any of your data subjects are EU persons or your service crosses into the EU. Genetic data is a special category data under GDPR
  4. NIH genomic data sharing policies: if you're using NIH‐funded controlled‐access human genomic & phenotypic datasets you need to abide. It has been updated to align with NIST SP 800-171:
    1. NIH Data Management & Sharing Policy (2023): you must submit and follow a data-management plan describing how scientific data will be stored, protected, and shared
    2. NIH Genomic Data Sharing Policy: requires controlled access for human genomic data, IRB approval, and participant consent for secondary use
    3. NIH Security Requirements (aligned with NIST SP 800-171): encrypt all CUI, use multi-factor authentication, audit access, and report incidents within 1 hour
    4. NIH Grants Policy Statement: mandates financial and research record retention, sub-award monitoring, and institutional oversight
    5. ClinicalTrials.gov Reporting: register and post study results for NIH-funded clinical research
  5. 21 CFR Part 11: validate software, log all actions, control access, link signatures, and ensure data integrity
  6. GxP (GMP/GLP/GCP): document every process, validate systems, and maintain full traceability of records and personnel (MLFlow allows you to do this end to end)
  7. GDPR: obtain consent, minimize stored data, allow deletion requests, and control cross-border transfers
  8. ICH E6 R2 / E8 R1: implement risk-based quality management and vendor-qualified validated systems
  9. SaMD / ISO 13485 / 62304: maintain design-control documentation, risk analysis, versioning, and post-market monitoring
  10. FedRAMP / AWS GovCloud: use approved cloud regions, implement continuous monitoring, and follow NIST 800-53 controls
  11. EMA Annex 11: ensure validated computerized systems, audit trails, change control, and secure electronic records
  12. NIST SP 800-171: overlaps with everything else encrypt data at rest + in transit, enforce access control (least privilege, MFA, RBAC), maintain audit logs of all access and system changes, continuously monitor for security incidents and vulnerabilities, document policies for configuration, training, and incident response

Cloud implementation

[UPDATE- need to support way more laws 10/15/25]

I have an actual implementation in Terraform for a genomic analysis project I did that you can use. It should be compliant across the board (except GDPR because I’m not in Europe).

Else, if you are using Cursor/Codex to scaffold your infrastructure as code, I recommend feeding it the following blurb when scaffolding your infra:

  • HIPAA Security Rule: encryption at rest (CMKs), in transit (TLS-deny), access control (least privilege), audit controls (CloudTrail data events + retention/immutability), integrity controls (log validation), person/process controls (SSM, no SSH keys)
  • NIH GDS / dbGaP: auditability, least privilege, data segregation, logging of object access, de-identification boundary via buckets/keys, immutable logs
  • NIST 800-171/53 controls: key management, boundary protection (endpoints, no public egress), configuration management (Config rules), incident response (logs retained), vulnerability/supply-chain (hardened AMIs, ECR mirror)
  • GDPR (if relevant): data minimization (zones), access logs, encryption, regional restriction (add bucket policy conditions on region/org)
Source
What
“Genomics Data Transfer, Analytics, and Machine Learning using AWS Services, Appendix D: Compliance Resources”
A list of regulations, gray areas, and how AWS helps with things like HIPAA, GDPR, GINA. (AWS Documentation)
“Architecting for Genomic Data Security and Compliance in AWS” (AWS whitepaper)
Best practices for working with controlled access datasets, setting up access controls, data location, data cleaning, retention, etc. (AWS Static)
AWS Genomics Guide / Genomics User Guide
Deep dive into security, classification of data, workflows, sharing, analysis. Helps you make design decisions. (AWS Static)
Navigating HCLS Regulatory and Compliance Requirements on AWS
Health & Life Sciences (HCLS) focus. Talks about audit logs, configuration / change management, DR, etc. (AWS Static)
AWS GDPR Center
Helps understand AWS’s part under GDPR, tools, data transfer issues, encryption, etc. (Amazon Web Services, Inc.)
AWS blog: Complying with updated NIH Genomic Data Sharing Policies
Brings you up to date on NIH’s newer rules (starting Jan 2025), how AWS helps satisfy those. (Amazon Web Services, Inc.)