Reusability report: Learning the transcriptional grammar in single-cell RNA-sequencing data using transformers

Item request has been placed!

Item request cannot be made.

Processing Request

اقرأ أكثر حفظ في قائمتي

المؤلفون: Khan, Sumeer Ahmad; Maillo, Alberto; Lagani, Vincenzo; Lehmann, Robert; Kiani, Narsis A.; Gomez-Cabrero, David; Tegner, Jesper
نوع التسجيلة:
article in journal/newspaper
اللغة:
unknown

معلومة اضافية
- Contributors:
  SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence, Thuwal, Saudi Arabia; Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia; Biological and Environmental Science and Engineering (BESE) Division; Bioscience Program; Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division; Institute of Chemical Biology, Ilia State University, Tbilisi, Georgia; Algorithmic Dynamic Lab, Department of Oncology and Pathology, Karolinska Institute, Stockholm, Sweden; Unit of Computational Medicine, Department of Medicine, Center for Molecular Medicine, Karolinska Institutet, Karolinska University Hospital, Stockholm, Sweden; Translational Bioinformatics Unit, Navarrabiomed, Universidad Pública de Navarra (UPNA), IdiSNA, Pamplona, Spain; Science for Life Laboratory, Solna, Sweden
- بيانات النشر:
  Springer Science and Business Media LLC
- الموضوع:
  2023
- Collection:
  King Abdullah University of Science and Technology: KAUST Repository
- نبذة مختصرة :
  The rise of single-cell genomics is an attractive opportunity for data-hungry machine learning algorithms. The scBERT method, inspired by the success of BERT (‘bidirectional encoder representations from transformers’) in natural language processing, was recently introduced by Yang et al. as a data-driven tool to annotate cell types in single-cell genomics data. Analogous to contextual embedding in BERT, scBERT leverages pretraining and self-attention mechanisms to learn the ‘transcriptional grammar’ of cells. Here we investigate the reusability beyond the original datasets, assessing the generalizability of natural language techniques in single-cell genomics. The degree of imbalance in the cell-type distribution substantially influences the performance of scBERT. Anticipating an increased utilization of transformers, we highlight the necessity to consider data distribution carefully and introduce a subsampling technique to mitigate the influence of an imbalanced distribution. Our analysis serves as a stepping stone towards understanding and optimizing the use of transformers in single-cell genomics. ; King Abdullah University of Science and Technology supported this work, which was also partially funded by the SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence.
- File Description:
  application/pdf
- ISSN:
  2522-5839
- Relation:
  2-s2.0-85176805400; Nature Machine Intelligence; http://hdl.handle.net/10754/695772
- الرقم المعرف:
  10.1038/s42256-023-00757-8
- الدخول الالكتروني :
  http://hdl.handle.net/10754/695772
  https://doi.org/10.1038/s42256-023-00757-8
- Rights:
  Archived with thanks to Nature Machine Intelligence under a Creative Commons license, details at: https://creativecommons.org/licenses/by/4.0 ; https://creativecommons.org/licenses/by/4.0
- الرقم المعرف:
  edsbas.C0DA4AAE

تعليقات

No Comments.

Reusability report: Learning the transcriptional grammar in single-cell RNA-sequencing data using transformers

اتصل بنا

اتبع