Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

Hybrid fragment-SMILES tokenization for ADMET prediction in drug discovery.

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • معلومة اضافية
    • المصدر:
      Publisher: BioMed Central Country of Publication: England NLM ID: 100965194 Publication Model: Electronic Cited Medium: Internet ISSN: 1471-2105 (Electronic) Linking ISSN: 14712105 NLM ISO Abbreviation: BMC Bioinformatics Subsets: MEDLINE
    • بيانات النشر:
      Original Publication: [London] : BioMed Central, 2000-
    • الموضوع:
    • نبذة مختصرة :
      Background: Drug discovery and development is the extremely costly and time-consuming process of identifying new molecules that can interact with a biomarker target to interrupt the disease pathway of interest. In addition to binding the target, a drug candidate needs to satisfy multiple properties affecting absorption, distribution, metabolism, excretion, and toxicity (ADMET). Artificial intelligence approaches provide an opportunity to improve each step of the drug discovery and development process, in which the first question faced by us is how a molecule can be informatively represented such that the in-silico solutions are optimized.
      Results: This study introduces a novel hybrid SMILES-fragment tokenization method, coupled with two pre-training strategies, utilizing a Transformer-based model. We investigate the efficacy of hybrid tokenization in improving the performance of ADMET prediction tasks. Our approach leverages MTL-BERT, an encoder-only Transformer model that achieves state-of-the-art ADMET predictions, and contrasts the standard SMILES tokenization with our hybrid method across a spectrum of fragment library cutoffs.
      Conclusion: The findings reveal that while an excess of fragments can impede performance, using hybrid tokenization with high frequency fragments enhances results beyond the base SMILES tokenization. This advancement underscores the potential of integrating fragment- and character-level molecular features within the training of Transformer models for ADMET property prediction.
      (© 2024. The Author(s).)
    • References:
      J Chem Inf Model. 2023 May 8;63(9):2628-2643. (PMID: 37125780)
      Molecules. 2019 Dec 21;25(1):. (PMID: 31877719)
      J Cheminform. 2023 Feb 23;15(1):27. (PMID: 36823530)
      Bioinformatics. 2022 Jun 27;38(13):3444-3453. (PMID: 35604079)
      J Chem Inf Model. 2024 Jan 8;64(1):9-17. (PMID: 38147829)
      Bioinformatics. 2022 May 13;38(10):2863-2871. (PMID: 35561160)
      Phys Chem Chem Phys. 2023 Jan 18;25(3):2377-2385. (PMID: 36597997)
      Chem Sci. 2017 Oct 31;9(2):513-530. (PMID: 29629118)
      Brief Bioinform. 2022 May 13;23(3):. (PMID: 35438145)
      J Chem Inf Model. 2017 Aug 28;57(8):1757-1772. (PMID: 28696688)
      Nucleic Acids Res. 2021 Jul 2;49(W1):W5-W14. (PMID: 33893803)
      J Mol Graph Model. 2023 Jan;118:108344. (PMID: 36242862)
      Front Pharmacol. 2020 Dec 18;11:565644. (PMID: 33390943)
      Pharmacol Rep. 2023 Feb;75(1):3-18. (PMID: 36624355)
      Drug Discov Today Technol. 2020 Dec;37:1-12. (PMID: 34895648)
      Front Robot AI. 2019 Nov 05;6:108. (PMID: 33501123)
      J Chem Inf Model. 2019 Jul 22;59(7):3166-3176. (PMID: 31273995)
      Drug Discov Today. 2019 May;24(5):1157-1165. (PMID: 30890362)
      Brief Bioinform. 2021 Nov 5;22(6):. (PMID: 33951729)
      J Med Chem. 2020 Aug 27;63(16):8835-8848. (PMID: 32286824)
      Bioinformatics. 2021 Apr 1;36(22-23):5545-5547. (PMID: 33275143)
      J Chem Inf Model. 2019 Aug 26;59(8):3370-3388. (PMID: 31361484)
      Commun Chem. 2023 Apr 3;6(1):60. (PMID: 37012352)
      Chem Sci. 2018 Nov 19;10(6):1692-1701. (PMID: 30842833)
      Drug Discov Today. 2008 Apr;13(7-8):341-6. (PMID: 18405847)
      Nucleic Acids Res. 2012 Jan;40(Database issue):D1100-7. (PMID: 21948594)
      J Cheminform. 2020 Mar 18;12(1):17. (PMID: 33431004)
      ChemMedChem. 2008 Oct;3(10):1503-7. (PMID: 18792903)
      J Comput Aided Mol Des. 2016 Aug;30(8):595-608. (PMID: 27558503)
      ACS Cent Sci. 2018 Feb 28;4(2):268-276. (PMID: 29532027)
      J Chem Inf Model. 2022 Jun 27;62(12):2973-2986. (PMID: 35675668)
      J Med Chem. 2020 Aug 27;63(16):8749-8760. (PMID: 31408336)
    • Grant Information:
      AI4D-108 National Research Council Canada; AI4D-108 National Research Council Canada; RGPIN-2021-03879 Natural Sciences and Engineering Research Council of Canada; RGPIN-2022-05418 Natural Sciences and Engineering Research Council of Canada; 2021-00214 Canada Research Chair Program; 42115 Canada Foundation for Innovation
    • Contributed Indexing:
      Keywords: ADMET prediction; Drug discovery; Fragments; SMILES; Transformer
    • الرقم المعرف:
      0 (Small Molecule Libraries)
      0 (Pharmaceutical Preparations)
    • الموضوع:
      Date Created: 20240801 Date Completed: 20240802 Latest Revision: 20240804
    • الموضوع:
      20240804
    • الرقم المعرف:
      PMC11295479
    • الرقم المعرف:
      10.1186/s12859-024-05861-z
    • الرقم المعرف:
      39090573