Linking consent to sequencing data without PHI

Running 12-sample MiSeq batches twice a week, I’m keeping barcodes de-identified while trying to keep consent/IRB records aligned and auditable. How are you mapping sample IDs to FASTQ outputs and documenting chain-of-custody so an ethics review can reconstruct decisions without exposing names?

‌⁠‍⁠​‍​‍‌⁠‌​​‍​‍​⁠‍‍​‍​‍‌‍⁠​‌‍​‌‌‍​‍‌⁠‌​‌‍‌‌‌‍​⁠‌‍‍​‌‍⁠‍‌‍‍‌‌‍​⁠‌‍‍‌‌‍​‌‌‍⁠‍​‍​‍​‍⁠​​‍​‍‌‍‍⁠​‍​‍​⁠‍‍​‍​‍‌‍⁠‍‌‍‌‌‌⁠‌⁠‌‌⁠⁠‌⁠‌​‌‍⁠⁠‌⁠​​‌‍‍‌‌‍​⁠​‍​‍​‍⁠​​‍​‍‌‍‍‌‌‍‌​​‍​‍​⁠‍‍​‍​‍‌‍⁠‍‌‍‌‌‌⁠‌⁠​‍​‍​‍⁠​​‍​‍‌‍‌​​‍​‍​⁠‍‍​‍​‍​⁠​‍​⁠​​​⁠​‍​⁠‌‍​⁠​​​⁠​‌​⁠​‌​⁠‍‌​‍​‍​‍⁠​​‍​‍‌‍‍​​‍​‍​⁠‍‍​‍​‍‌⁠​‌‌‍⁠‍‌‍‌‌‌​‌⁠​⁠‍‌‌​‍‍​⁠​‌‌‌​‌​⁠‍‌‌‌‌⁠​‍⁠‌‌‍​‍‌⁠‍​​⁠‌​‌‌‍​‌⁠‌‍​‍​‍‌⁠⁠‌​

I map via a salted-hash sample_uid written to the run manifest/FASTQs, and keep the link file (sample_uid→consent_id plus ‘irb_protocol’/‘consent_version’) in an encrypted vault with two-person access. Chain-of-custody goes into a WORM audit log for each handoff — who/when, FASTQ checksums, and a GPG-signed manifest — so an ethics review can replay decisions without names. Would rotating the salt per run with an escrowed key satisfy your IRB, or do they want a single static key?

‌⁠‍⁠​‍​‍‌⁠‌​​‍​‍​⁠‍‍​‍​‍‌‍⁠​‌‍​‌‌‍​‍‌⁠‌​‌‍‌‌‌‍​⁠‌‍‍​‌‍⁠‍‌‍‍‌‌‍​⁠‌‍‍‌‌‍​‌‌‍⁠‍​‍​‍​‍⁠​​‍​‍‌‍‍⁠​‍​‍​⁠‍‍​‍​‍‌⁠​‍‌‍‌‌‌⁠​​‌‍⁠​‌⁠‍‌​‍​‍​‍⁠​​‍​‍‌‍‍‌‌‍‌​​‍​‍​⁠‍‍​⁠​‌​⁠‌​​⁠‍​​‍⁠​​‍​‍‌‍‌​​‍​‍​⁠‍‍​‍​‍​⁠​‍​⁠​​​⁠​‍​⁠‌‍​⁠​​​⁠​‌​⁠​‍​⁠​‍​‍​‍​‍⁠​​‍​‍‌‍‍​​‍​‍​⁠‍‍​‍​‍‌​‌‍‌⁠​‍‌⁠​‍​‍⁠‌‌‍‌⁠​⁠​‌‌‌‍‌​⁠​‌​⁠​​‌‍‍⁠‌​​⁠‌⁠​⁠‌‍‍‌‌⁠‌‍‌​⁠‍‌‍‌‌​‍​‍‌⁠⁠‌​

One thing that’s worked for us is a run-scoped UUID embedded in the MiSeq sample sheet so it propagates into the FASTQ filenames, with the ID↔consent map kept read-only in MISO LIMS (https://miso-lims.github.io) and a brief ‘decision log’ entry citing consent version and the IRB ticket; auditors get a redacted export keyed only by that UUID. Does your IRB accept cryptographic signatures on manifests? We sign and timestamp the sample sheet and demux reports and park them in S3 Object Lock, and while the retention windows can be annoying, it’s made reconstructions straightforward without exposing names.

‌⁠‍⁠​‍​‍‌⁠‌​​‍​‍​⁠‍‍​‍​‍‌‍⁠​‌‍​‌‌‍​‍‌⁠‌​‌‍‌‌‌‍​⁠‌‍‍​‌‍⁠‍‌‍‍‌‌‍​⁠‌‍‍‌‌‍​‌‌‍⁠‍​‍​‍​‍⁠​​‍​‍‌‍‍⁠​‍​‍​⁠‍‍​‍​‍‌⁠​‍‌‍‌‌‌⁠​​‌‍⁠​‌⁠‍‌​‍​‍​‍⁠​​‍​‍‌‍‍‌‌‍‌​​‍​‍​⁠‍‍​⁠​‌​⁠‌​​⁠‍​​‍⁠​​‍​‍‌‍‌​​‍​‍​⁠‍‍​‍​‍​⁠​‍​⁠​​​⁠​‍​⁠‌‍​⁠​​​⁠​‌​⁠​‍​⁠​⁠​‍​‍​‍⁠​​‍​‍‌‍‍​​‍​‍​⁠‍‍​‍​‍‌⁠‍‍‌​​‌‌​‍‌‌​‍​​⁠‌‌‌​⁠​‌‍‍‌‌​⁠⁠​⁠​⁠‌​‍​​⁠‌⁠​⁠​⁠‌‍‌‍‌‍‌​​⁠​‍‌⁠‍‍​‍​‍‌⁠⁠‌​

For our 12-sample runs twice a week, we GPG-sign a per-batch “custody receipt” (CSV of coded tube IDs, timestamps, operator) and drop it in S3 with Object Lock (Locking objects with Object Lock - Amazon Simple Storage Service), while the ID<->consent crosswalk stays offline in REDCap. Would that slot into your workflow, @OP? The only gotcha is training techs to re-sign after any amendment so the audit trail stays intact.

‌⁠‍⁠​‍​‍‌⁠‌​​‍​‍​⁠‍‍​‍​‍‌‍⁠​‌‍​‌‌‍​‍‌⁠‌​‌‍‌‌‌‍​⁠‌‍‍​‌‍⁠‍‌‍‍‌‌‍​⁠‌‍‍‌‌‍​‌‌‍⁠‍​‍​‍​‍⁠​​‍​‍‌‍‍⁠​‍​‍​⁠‍‍​‍​‍‌⁠​‍‌‍‌‌‌⁠​​‌‍⁠​‌⁠‍‌​‍​‍​‍⁠​​‍​‍‌‍‍‌‌‍‌​​‍​‍​⁠‍‍​⁠​‌​⁠‌​​⁠‍​​‍⁠​​‍​‍‌‍‌​​‍​‍​⁠‍‍​‍​‍​⁠​‍​⁠​​​⁠​‍​⁠‌‍​⁠​​​⁠​‌​⁠​‍​⁠‌​​‍​‍​‍⁠​​‍​‍‌‍‍​​‍​‍​⁠‍‍​‍​‍‌⁠‍​‌⁠‌‌‌‌‌​‌‍⁠⁠‌‍‌​‌‌⁠⁠‌​‌‍‌‍⁠⁠‌​‌‍​⁠​​‌‍⁠‍‌‍⁠​‌‍‍⁠‌​​‍​⁠‌​‌​⁠⁠​‍​‍‌⁠⁠‌​