Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

Assessing the generalization capabilities of TCR binding predictors via peptide distance analysis.

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • معلومة اضافية
    • المصدر:
      Publisher: Public Library of Science Country of Publication: United States NLM ID: 101285081 Publication Model: eCollection Cited Medium: Internet ISSN: 1932-6203 (Electronic) Linking ISSN: 19326203 NLM ISO Abbreviation: PLoS One Subsets: MEDLINE
    • بيانات النشر:
      Original Publication: San Francisco, CA : Public Library of Science
    • الموضوع:
    • نبذة مختصرة :
      Competing Interests: Funding for this project was provided by NEC Labs Europe GmbH.
      Understanding the interaction between T Cell Receptors (TCRs) and peptide-bound Major Histocompatibility Complexes (pMHCs) is crucial for comprehending immune responses and developing targeted immunotherapies. While recent machine learning (ML) models show remarkable success in predicting TCR-pMHC binding within training data, these models often fail to generalize to peptides outside their training distributions, raising concerns about their applicability in therapeutic settings. Understanding and improving the generalization of these models is therefore critical to ensure real-world applications. To address this issue, we evaluate the effect of the distance between training and testing peptide distributions on ML model empirical risk assessments, using sequence-based and 3D structure-based distance metrics. In our analysis we use several state-of-the-art models for TCR-peptide binding prediction: Attentive Variational Information Bottleneck (AVIB), NetTCR-2.0 and -2.2, and ERGO II (pre-trained autoencoder) and ERGO II (LSTM). In this work, we introduce a novel approach for assessing the generalization capabilities of TCR binding predictors: the Distance Split (DS) algorithm. The DS algorithm controls the distance between training and testing peptides based on both sequence and structure, allowing for a more nuanced evaluation of model performance. We show that lower 3D shape similarity between training and test peptides is associated with a harder out-of-distribution task definition, which is more interesting when measuring the ability to generalize to unseen peptides. However, we observe the opposite effect when splitting using sequence-based similarity. These findings highlight the importance of using a distance-based splitting approach to benchmark models. This could then be used to estimate a confidence score on predictions on novel and unseen peptides, based on how different they are from the training ones. Additionally, our results may hint that employing 3D shape to complement sequence information could improve the accuracy of TCR-pMHC binding predictors.
      (Copyright: © 2025 Castorina et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
    • References:
      Immunology. 2012 Jan;135(1):19-26. (PMID: 22044118)
      Elife. 2023 Jan 20;12:. (PMID: 36661395)
      Nat Rev Immunol. 2007 May;7(5):403-10. (PMID: 17457346)
      Curr Opin Immunol. 2019 Jun;58:89-97. (PMID: 31170601)
      Nat Rev Immunol. 2020 Nov;20(11):651-668. (PMID: 32433532)
      Commun Biol. 2021 Sep 10;4(1):1060. (PMID: 34508155)
      Bioinformatics. 2023 Jan 1;39(1):. (PMID: 36571499)
      Angew Chem Int Ed Engl. 2023 Feb 6;62(7):e202213362. (PMID: 36542066)
      J Immunol. 2001 Mar 1;166(5):3345-54. (PMID: 11207290)
      Science. 1996 Jun 21;272(5269):1755-62. (PMID: 8650574)
      Proc Natl Acad Sci U S A. 1992 Nov 15;89(22):10915-9. (PMID: 1438297)
      J Hematol Oncol. 2019 Dec 18;12(1):139. (PMID: 31852498)
      Nat Struct Biol. 2003 Dec;10(12):980. (PMID: 14634627)
      Science. 2023 Mar 17;379(6637):1123-1130. (PMID: 36927031)
      Bioinformatics. 2023 Jan 1;39(1):. (PMID: 36637198)
      Front Immunol. 2020 Aug 25;11:1803. (PMID: 32983088)
      Protein Cell. 2024 May 28;15(6):403-418. (PMID: 38442025)
      Methods Enzymol. 2009;466:359-81. (PMID: 21609868)
      Annu Rev Immunol. 2015;33:169-200. (PMID: 25493333)
      Nat Immunol. 2007 Sep;8(9):975-83. (PMID: 17694060)
      Bioinformatics. 2009 Jun 1;25(11):1422-3. (PMID: 19304878)
      Biochem Soc Trans. 2021 Nov 1;49(5):2319-2331. (PMID: 34581761)
      J Immunol. 2020 Apr 1;204(7):1943-1953. (PMID: 32102902)
      Protein Eng Des Sel. 2024 Jan 29;37:. (PMID: 38288671)
      Nature. 2017 Jul 6;547(7661):94-98. (PMID: 28636589)
      Immunology. 2003 Oct;110(2):163-9. (PMID: 14511229)
      Annu Rev Immunol. 2009;27:591-619. (PMID: 19132916)
      Immunity. 2018 Feb 20;48(2):214-226. (PMID: 29466754)
      BMC Bioinformatics. 2010 Jul 01;11:363. (PMID: 20594332)
      Elife. 2024 Mar 04;12:. (PMID: 38437160)
      Nucleic Acids Res. 2020 Jan 8;48(D1):D1057-D1062. (PMID: 31588507)
      Nat Commun. 2024 Apr 13;15(1):3211. (PMID: 38615042)
      Nucleic Acids Res. 2018 Jan 4;46(D1):D419-D427. (PMID: 28977646)
      Brief Bioinform. 2024 Nov 22;26(1):. (PMID: 39576224)
      Nature. 2021 Aug;596(7873):583-589. (PMID: 34265844)
      Front Immunol. 2023 Apr 18;14:1128326. (PMID: 37143667)
      Hematology Am Soc Hematol Educ Program. 2013;2013:342-7. (PMID: 24319202)
      Elife. 2021 Aug 26;10:. (PMID: 34435953)
      Front Bioinform. 2023 Dec 18;3:1274599. (PMID: 38170146)
      Front Immunol. 2022 Oct 21;13:1014256. (PMID: 36341448)
      Front Genet. 2024 Oct 02;15:1346784. (PMID: 39415981)
      Brief Bioinform. 2024 Mar 27;25(3):. (PMID: 38711371)
      Science. 2022 Feb 4;375(6580):507. (PMID: 35113705)
      Bioinform Adv. 2024 Nov 29;4(1):vbae190. (PMID: 39678207)
      Brief Bioinform. 2021 Jul 20;22(4):. (PMID: 33346826)
      Nucleic Acids Res. 2019 Jan 8;47(D1):D339-D343. (PMID: 30357391)
      Front Immunol. 2020 Oct 22;11:565096. (PMID: 33193332)
      Bioinformatics. 2017 Sep 15;33(18):2924-2929. (PMID: 28481982)
      Commun Biol. 2022 Apr 5;5(1):312. (PMID: 35383272)
      Proc Natl Acad Sci U S A. 2014 Sep 9;111(36):13139-44. (PMID: 25157137)
      BMC Bioinformatics. 2019 Oct 10;20(1):490. (PMID: 31601176)
      Acta Crystallogr F Struct Biol Commun. 2020 Oct 1;76(Pt 10):501-507. (PMID: 33006579)
      Front Immunol. 2021 Apr 26;12:664514. (PMID: 33981311)
      J Chem Inf Model. 2020 Mar 23;60(3):1245-1252. (PMID: 32126171)
      Bioinformatics. 2021 Jul 12;37(Suppl_1):i237-i244. (PMID: 34252922)
      Protein Eng Des Sel. 2007 Jan;20(1):33-7. (PMID: 17218333)
      Bioinformatics. 2023 Dec 1;39(12):. (PMID: 38070156)
    • الرقم المعرف:
      0 (Receptors, Antigen, T-Cell)
      0 (Peptides)
    • الموضوع:
      Date Created: 20250520 Date Completed: 20250521 Latest Revision: 20250522
    • الموضوع:
      20250522
    • الرقم المعرف:
      PMC12091837
    • الرقم المعرف:
      10.1371/journal.pone.0324011
    • الرقم المعرف:
      40392871