Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

End-to-end sequence-structure-function meta-learning predicts genome-wide chemical-protein interactions for dark proteins.

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • معلومة اضافية
    • نبذة مختصرة :
      Systematically discovering protein-ligand interactions across the entire human and pathogen genomes is critical in chemical genomics, protein function prediction, drug discovery, and many other areas. However, more than 90% of gene families remain "dark"—i.e., their small-molecule ligands are undiscovered due to experimental limitations or human/historical biases. Existing computational approaches typically fail when the dark protein differs from those with known ligands. To address this challenge, we have developed a deep learning framework, called PortalCG, which consists of four novel components:(i) a 3-dimensional ligand binding site enhanced sequence pre-training strategy to encode the evolutionary links between ligand-binding sites across gene families; (ii) an end-to-end pretraining-fine-tuning strategy to reduce the impact of inaccuracy of predicted structures on function predictions by recognizing the sequence-structure-function paradigm; (iii) a new out-of-cluster meta-learning algorithm that extracts and accumulates information learned from predicting ligands of distinct gene families (meta-data) and applies the meta-data to a dark gene family; and (iv) a stress model selection step, using different gene families in the test data from those in the training and development data sets to facilitate model deployment in a real-world scenario. In extensive and rigorous benchmark experiments, PortalCG considerably outperformed state-of-the-art techniques of machine learning and protein-ligand docking when applied to dark gene families, and demonstrated its generalization power for target identifications and compound screenings under out-of-distribution (OOD) scenarios. Furthermore, in an external validation for the multi-target compound screening, the performance of PortalCG surpassed the rational design from medicinal chemists. Our results also suggest that a differentiable sequence-structure-function deep learning framework, where protein structural information serves as an intermediate layer, could be superior to conventional methodology where predicted protein structures were used for the compound screening. We applied PortalCG to two case studies to exemplify its potential in drug discovery: designing selective dual-antagonists of dopamine receptors for the treatment of opioid use disorder (OUD), and illuminating the understudied human genome for target diseases that do not yet have effective and safe therapeutics. Our results suggested that PortalCG is a viable solution to the OOD problem in exploring understudied regions of protein functional space. Author summary: Many complex diseases, such as Alzheimer's disease, mental disorders, and substance use disorders, do not have safe and effective therapeutics because of the polygenic nature of the diseases and a lack of thoroughly validated drug targets (and their corresponding ligands). Identifying small-molecule ligands for all proteins encoded in the human genome would provide powerful new opportunities for drug discovery of currently untreatable diseases. However, the small-molecule ligand of more than 90% of gene families is completely unknown. Existing protein-ligand docking and machine learning methods often fail when the protein of interest is dissimilar to those with known functions or structures. We have developed a new deep learning framework, PortalCG, for efficiently and accurately predicting ligands of understudied proteins which are out of reach of existing methods. Our method achieves unprecedented accuracy versus state-of-the-art approaches, and it achieves this by incorporating ligand binding site information and the sequence-to-structure-to-function paradigm into a novel deep meta-learning algorithm. In a case study, the performance of PortalCG surpassed the rational design from medicinal chemists. The proposed computational framework can shed new light on how chemicals modulate biological systems, which is indispensable in drug repurposing and rational design of polypharmacology. This approach could offer a new way to develop safe and effective therapeutics for currently incurable diseases. PortalCG can be extended to other types of tasks, such as predicting protein-protein interactions and protein-nucleic acid recognition. [ABSTRACT FROM AUTHOR]
    • نبذة مختصرة :
      Copyright of PLoS Computational Biology is the property of Public Library of Science and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)