ViPAR: a software platform for the Virtual Pooling and Analysis of Research Data

Carter, Kim W and Carter, KW and Francis, RW and Bresnahan, M and Gissler, M and Grønborg, TK and Gross, R and Gunnes, N and Hammond, G and Hornig, M and Hultman, CM and Huttunen, J and Langridge, A and Leonard, H and Newman, S and Parner, ET and Petersson, G and Reichenberg, A and Sandin, S and Schendel, DE and Schalkwyk, L and Sourander, A and Steadman, C and Stoltenberg, C and Suominen, A and Surén, P and Susser, E and Sylvester Vethanayagam, A and Yusof, Z (2016) ViPAR: a software platform for the Virtual Pooling and Analysis of Research Data. International Journal of Epidemiology, 45 (2). pp. 408-416. DOI https://doi.org/10.1093/ije/dyv193

Abstract

Background: Research studies exploring the determinants of disease require sufficient statistical power to detect meaningful effects. Sample size is often increased through centralized pooling of disparately located datasets, though ethical, privacy and data ownership issues can often hamper this process. Methods that facilitate the sharing of research data that are sympathetic with these issues and which allow flexible and detailed statistical analyses are therefore in critical need. We have created a software platform for the Virtual Pooling and Analysis of Research data (ViPAR), which employs free and open source methods to provide researchers with a web-based platform to analyse datasets housed in disparate locations. Methods: Database federation permits controlled access to remotely located datasets from a central location. The Secure Shell protocol allows data to be securely exchanged between devices over an insecure network. ViPAR combines these free technologies into a solution that facilitates 'virtual pooling' where data can be temporarily pooled into computer memory and made available for analysis without the need for permanent central storage. Results: Within the ViPAR infrastructure, remote sites manage their own harmonized research dataset in a database hosted at their site, while a central server hosts the data federation component and a secure analysis portal. When an analysis is initiated, requested data are retrieved from each remote site and virtually pooled at the central site. The data are then analysed by statistical software and, on completion, results of the analysis are returned to the user and the virtually pooled data are removed from memory. Conclusions: ViPAR is a secure, flexible and powerful analysis platform built on open source technology that is currently in use by large international consortia, and is made publicly available at [http://bioinformatics.childhealthresearch.org.au/software/vipar/].

Item Metadata

Item Type:	Article
Uncontrolled Keywords:	ViPAR; data sharing; data federation; data pooling
Subjects:	R Medicine > RC Internal medicine > RC0321 Neuroscience. Biological psychiatry. Neuropsychiatry
Divisions:	Faculty of Science and Health Faculty of Science and Health > Life Sciences, School of
SWORD Depositor:	Unnamed user with email elements@essex.ac.uk
Depositing User:	Unnamed user with email elements@essex.ac.uk
Date Deposited:	18 Apr 2016 14:13
Last Modified:	04 Dec 2024 06:36
URI:	http://repository.essex.ac.uk/id/eprint/15882

Available files

Published Version

Filename: dyv193.pdf

Licence: Creative Commons: Attribution-Noncommercial 3.0

Download

ViPAR: a software platform for the Virtual Pooling and Analysis of Research Data

Abstract

Item Metadata

Share and export

Available files

Published Version

Statistics

Altmetrics

Downloads