Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

Efficient genomic prediction based on whole-genome sequence data using split-and-merge Bayesian variable selection.

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • معلومة اضافية
    • المصدر:
      Publisher: BioMed Central Country of Publication: France NLM ID: 9114088 Publication Model: Electronic Cited Medium: Internet ISSN: 1297-9686 (Electronic) Linking ISSN: 0999193X NLM ISO Abbreviation: Genet Sel Evol Subsets: MEDLINE
    • بيانات النشر:
      Publication: London : BioMed Central
      Original Publication: Paris : Elsevier, c1989-
    • الموضوع:
    • نبذة مختصرة :
      Background: Use of whole-genome sequence data is expected to increase persistency of genomic prediction across generations and breeds but affects model performance and requires increased computing time. In this study, we investigated whether the split-and-merge Bayesian stochastic search variable selection (BSSVS) model could overcome these issues. BSSVS is performed first on subsets of sequence-based variants and then on a merged dataset containing variants selected in the first step.
      Results: We used a dataset that included 4,154,064 variants after editing and de-regressed proofs for 3415 reference and 2138 validation bulls for somatic cell score, protein yield and interval first to last insemination. In the first step, BSSVS was performed on 106 subsets each containing ~39,189 variants. In the second step, 1060 up to 472,492 variants, selected from the first step, were included to estimate the accuracy of genomic prediction. Accuracies were at best equal to those achieved with the commonly used Bovine 50k-SNP chip, although the number of variants within a few well-known quantitative trait loci regions was considerably enriched. When variant selection and the final genomic prediction were performed on the same data, predictions were biased. Predictions computed as the average of the predictions computed for each subset achieved the highest accuracies, i.e. 0.5 to 1.1 % higher than the accuracies obtained with the 50k-SNP chip, and yielded the least biased predictions. Finally, the accuracy of genomic predictions obtained when all sequence-based variants were included was similar or up to 1.4 % lower compared to that based on the average predictions across the subsets. By applying parallelization, the split-and-merge procedure was completed in 5 days, while the standard analysis including all sequence-based variants took more than three months.
      Conclusions: The split-and-merge approach splits one large computational task into many much smaller ones, which allows the use of parallel processing and thus efficient genomic prediction based on whole-genome sequence data. The split-and-merge approach did not improve prediction accuracy, probably because we used data on a single breed for which relationships between individuals were high. Nevertheless, the split-and-merge approach may have potential for applications on data from multiple breeds.
    • References:
      Genetics. 2010 Jun;185(2):623-31. (PMID: 20308278)
      BMC Genomics. 2011 May 31;12(1):274. (PMID: 21627800)
      Methods Mol Biol. 2013;1019:215-36. (PMID: 23756893)
      Genet Sel Evol. 2014 Jul 15;46:41. (PMID: 25022768)
      BMC Genet. 2014 Oct 03;15:105. (PMID: 25277486)
      Genet Sel Evol. 2011 May 17;43:18. (PMID: 21575265)
      BMC Genet. 2015 Dec 23;16:146. (PMID: 26698836)
      Genet Sel Evol. 2014 Apr 03;46:24. (PMID: 24708180)
      Nat Genet. 2014 Aug;46(8):858-65. (PMID: 25017103)
      Heredity (Edinb). 2014 Jan;112(1):39-47. (PMID: 23549338)
      Genetics. 2008 Jul;179(3):1503-12. (PMID: 18622038)
      Theor Popul Biol. 2008 Aug;74(1):130-7. (PMID: 18572214)
      Genet Sel Evol. 2015 Sep 17;47:71. (PMID: 26381777)
      J Anim Breed Genet. 2016 Jun;133(3):167-79. (PMID: 26776363)
      Am J Hum Genet. 2009 Feb;84(2):210-23. (PMID: 19200528)
      Genet Sel Evol. 2009 Dec 29;41:53. (PMID: 20040081)
      BMC Genomics. 2014 Aug 27;15:728. (PMID: 25164068)
      Genetics. 2013 Feb;193(2):327-45. (PMID: 22745228)
      Genetics. 1987 Oct;117(2):331-41. (PMID: 3666445)
      Genet Sel Evol. 2015 May 09;47:43. (PMID: 25956961)
      PLoS One. 2009 Aug 05;4(8):e6524. (PMID: 19654876)
      Genetics. 2009 Dec;183(4):1545-53. (PMID: 19822733)
      J Dairy Sci. 2013 Jul;96(7):4678-87. (PMID: 23660137)
      Nucleic Acids Res. 1990 Dec 11;18(23):6935-42. (PMID: 1979856)
      Genome Res. 2002 Feb;12(2):222-31. (PMID: 11827942)
      J Dairy Sci. 2001 Jul;84(7):1759-67. (PMID: 11467826)
      Bioinformatics. 2015 Mar 1;31(5):782-4. (PMID: 25338720)
      Genetics. 2012 Apr;190(4):1491-501. (PMID: 22135352)
      Genetics. 2001 Apr;157(4):1819-29. (PMID: 11290733)
      J Dairy Sci. 2011 Dec;94(12):6116-21. (PMID: 22118099)
      Genetics. 2014 Dec;198(4):1671-84. (PMID: 25233989)
      Genetics. 2009 Sep;183(1):347-63. (PMID: 19620397)
      BMC Bioinformatics. 2011 May 23;12:186. (PMID: 21605355)
      Proc Natl Acad Sci U S A. 2002 Jul 9;99(14):9300-5. (PMID: 12077321)
      Nat Methods. 2008 Mar;5(3):247-52. (PMID: 18297082)
      Genet Sel Evol. 2015 Aug 01;47:61. (PMID: 26232271)
      BMC Genomics. 2013 Jan 28;14:59. (PMID: 23356797)
      J Dairy Sci. 2015 Jun;98(6):4107-16. (PMID: 25892697)
    • الموضوع:
      Date Created: 20160701 Date Completed: 20170215 Latest Revision: 20181113
    • الموضوع:
      20240829
    • الرقم المعرف:
      PMC4926307
    • الرقم المعرف:
      10.1186/s12711-016-0225-x
    • الرقم المعرف:
      27357580