Venugopal, Rohit and Shafqat, Noman and Venugopal, Ishwar and Tillbury, Benjamin Mark John and Stafford, Harry Demetrios and Bourazeri, Aikaterini (2022) Privacy preserving Generative Adversarial Networks to model Electronic Health Records. Neural Networks, 153. pp. 339-348. DOI https://doi.org/10.1016/j.neunet.2022.06.022
Venugopal, Rohit and Shafqat, Noman and Venugopal, Ishwar and Tillbury, Benjamin Mark John and Stafford, Harry Demetrios and Bourazeri, Aikaterini (2022) Privacy preserving Generative Adversarial Networks to model Electronic Health Records. Neural Networks, 153. pp. 339-348. DOI https://doi.org/10.1016/j.neunet.2022.06.022
Venugopal, Rohit and Shafqat, Noman and Venugopal, Ishwar and Tillbury, Benjamin Mark John and Stafford, Harry Demetrios and Bourazeri, Aikaterini (2022) Privacy preserving Generative Adversarial Networks to model Electronic Health Records. Neural Networks, 153. pp. 339-348. DOI https://doi.org/10.1016/j.neunet.2022.06.022
Abstract
Hospitals and General Practitioner (GP) surgeries within National Health Services (NHS), collect patient information on a routine basis to create personal health records such as family medical history, chronic diseases, medications and dosing. The collected information could be used to build and model various machine learning algorithms, to simplify the task of those working within the NHS. However, such Electronic Health Records are not made publicly available due to privacy concerns. In our paper, we propose a privacy-preserving Generative Adversarial Network (pGAN), which can generate synthetic data of high quality, while preserving the privacy and statistical properties of the source data. pGAN is evaluated on two distinct datasets, one posing as a Classification task, and the other as a Regression task. Privacy score of generated data is calculated using the Nearest Neighbour Adversarial Accuracy. Cosine similarity scores of synthetic data from our proposed model indicate that the data generated is similar in nature, but not identical. Additionally, our proposed model was able to preserve privacy while maintaining high utility. Machine learning models trained on both synthetic data and original data have achieved accuracies of 74.3% and 74.5% respectively on the classification dataset; while they have attained an R2-Score of 0.84 and 0.85 on synthetic and original data of the regression task respectively. Our results, therefore, indicate that synthetic data from the proposed model could replace the use of original data for machine learning while preserving privacy.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | AI; GAN; Machine learning; Privacy; Public health data |
Divisions: | Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
Depositing User: | Unnamed user with email elements@essex.ac.uk |
Date Deposited: | 20 Jul 2022 10:33 |
Last Modified: | 30 Oct 2024 20:47 |
URI: | http://repository.essex.ac.uk/id/eprint/33082 |
Available files
Filename: 1-s2.0-S0893608022002374-main.pdf
Licence: Creative Commons: Attribution 3.0