Description
Modern experimental techniques and medical examination methods generate vast amounts of data that must be protected carefully according to the GDPR. At the same time, handling of such data, e.g., human DNA sequences, MRI images, or EEG traces is necessary for medical research and personalized medicine. Especially when performing statistical analyses on many datasets, the amount of data is too big to be handled on personal computers or local workstations.
High Performance Computing (HPC) clusters are ideally suited for handling these tasks even on the terabyte scale in acceptable time. Their typical architecture however is optimized towards performance and less towards data security.
We present concepts on how to provide users of the bwForCluster BinAC 2 with the computational power of a supercomputer as well data integrity and security compliant with the GDPR and other data protection regulations. This can be achieved by separation and isolation of storage, network, and compute nodes for the handling of sensitive data. Dedicated techniques cover the creation of isolated on-the-fly subclusters, use of advanced containerization and virtualization technologies and extensive logging for detecting and recording unauthorized data access. At the heart of these activities is the definition and documentation of procedures jointly with the users of the system. The overarching goal is the establishment of an information security management system and its certification according to ISO27001.