Item request has been placed!
×
Item request cannot be made.
×
Processing Request
Collaborative learning from distributed data with differentially private synthetic data.
Item request has been placed!
×
Item request cannot be made.
×
Processing Request
- معلومة اضافية
- المصدر:
Publisher: BioMed Central Country of Publication: England NLM ID: 101088682 Publication Model: Electronic Cited Medium: Internet ISSN: 1472-6947 (Electronic) Linking ISSN: 14726947 NLM ISO Abbreviation: BMC Med Inform Decis Mak Subsets: MEDLINE
- بيانات النشر:
Original Publication: London : BioMed Central, [2001-
- الموضوع:
- نبذة مختصرة :
Background: Consider a setting where multiple parties holding sensitive data aim to collaboratively learn population level statistics, but pooling the sensitive data sets is not possible due to privacy concerns and parties are unable to engage in centrally coordinated joint computation. We study the feasibility of combining privacy preserving synthetic data sets in place of the original data for collaborative learning on real-world health data from the UK Biobank.
Methods: We perform an empirical evaluation based on an existing prospective cohort study from the literature. Multiple parties were simulated by splitting the UK Biobank cohort along assessment centers, for which we generate synthetic data using differentially private generative modelling techniques. We then apply the original study's Poisson regression analysis on the combined synthetic data sets and evaluate the effects of 1) the size of local data set, 2) the number of participating parties, and 3) local shifts in distributions, on the obtained likelihood scores.
Results: We discover that parties engaging in the collaborative learning via shared synthetic data obtain more accurate estimates of the regression parameters compared to using only their local data. This finding extends to the difficult case of small heterogeneous data sets. Furthermore, the more parties participate, the larger and more consistent the improvements become up to a certain limit. Finally, we find that data sharing can especially help parties whose data contain underrepresented groups to perform better-adjusted analysis for said groups.
Conclusions: Based on our results we conclude that sharing of synthetic data is a viable method for enabling learning from sensitive data without violating privacy constraints even if individual data sets are small or do not represent the overall population well. Lack of access to distributed sensitive data is often a bottleneck in biomedical research, which our study shows can be alleviated with privacy-preserving collaborative learning methods.
(© 2024. The Author(s).)
- References:
Circ Cardiovasc Qual Outcomes. 2019 Jul;12(7):e005122. (PMID: 31284738)
Biometrika. 1947;34(1-2):28-35. (PMID: 20287819)
PLoS Med. 2015 Mar 31;12(3):e1001779. (PMID: 25826379)
Patterns (N Y). 2021 Jun 07;2(7):100271. (PMID: 34286296)
BMC Med. 2020 May 29;18(1):160. (PMID: 32466757)
- Grant Information:
325572 Research Council of Finland; 325572 Research Council of Finland; 325573 Research Council of Finland; 325572 Research Council of Finland; Project 101070617 European Union; 336032 Strategic Research Council (SRC) established within the Research Council of Finland; EP/W002973/1 UK Research and Innovation
- Contributed Indexing:
Keywords: Collaborative learning; Differential privacy; Health informatics; Synthetic data
- الموضوع:
Date Created: 20240614 Date Completed: 20240615 Latest Revision: 20240618
- الموضوع:
20240618
- الرقم المعرف:
PMC11179391
- الرقم المعرف:
10.1186/s12911-024-02563-7
- الرقم المعرف:
38877563
No Comments.