Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

DEL-Thyroid: deep ensemble learning framework for detection of thyroid cancer progression through genomic mutation.

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • معلومة اضافية
    • المصدر:
      Publisher: BioMed Central Country of Publication: England NLM ID: 101088682 Publication Model: Electronic Cited Medium: Internet ISSN: 1472-6947 (Electronic) Linking ISSN: 14726947 NLM ISO Abbreviation: BMC Med Inform Decis Mak Subsets: MEDLINE
    • بيانات النشر:
      Original Publication: London : BioMed Central, [2001-
    • الموضوع:
    • نبذة مختصرة :
      Genes, expressed as sequences of nucleotides, are susceptible to mutations, some of which can lead to cancer. Machine learning and deep learning methods have emerged as vital tools in identifying mutations associated with cancer. Thyroid cancer ranks as the 5th most prevalent cancer in the USA, with thousands diagnosed annually. This paper presents an ensemble learning model leveraging deep learning techniques such as Long Short-Term Memory (LSTM), Gated Recurrent Units (GRUs), and Bi-directional LSTM (Bi-LSTM) to detect thyroid cancer mutations early. The model is trained on a dataset sourced from asia.ensembl.org and IntOGen.org, consisting of 633 samples with 969 mutations across 41 genes, collected from individuals of various demographics. Feature extraction encompasses techniques including Hahn moments, central moments, raw moments, and various matrix-based methods. Evaluation employs three testing methods: self-consistency test (SCT), independent set test (IST), and 10-fold cross-validation test (10-FCVT). The proposed ensemble learning model demonstrates promising performance, achieving 96% accuracy in the independent set test (IST). Statistical measures such as training accuracy, testing accuracy, recall, sensitivity, specificity, Mathew's Correlation Coefficient (MCC), loss, training accuracy, F1 Score, and Cohen's kappa are utilized for comprehensive evaluation.
      (© 2024. The Author(s).)
    • References:
      Cabanillas ME, McFadden DG, Durante C. Thyroid cancer. Lancet. 2016;388(10061):2783–95. https://doi.org/10.1016/S0140-6736(16)30172-6 . (PMID: 10.1016/S0140-6736(16)30172-627240885)
      Bach-Huynh TG, Jonklaas J. Thyroid medications during pregnancy. Ther Drug Monit. 2006;28(3):431–41. https://doi.org/10.1097/01.ftd.0000211834.41844.82 . (PMID: 10.1097/01.ftd.0000211834.41844.8216778730)
      Knudson AG, Strong’ LC. Mutation and Cancer: Neuroblastoma and Pheochromocytoma. Amer J Hum Genet. 1972;24:514–32. (PMID: 43409741762170)
      Sollini M, Cozzi L, Chiti A, Kirienko M. Texture analysis and machine learning to characterize suspected thyroid nodules and differentiated thyroid cancer: Where do we stand? Eur. J. Radiol., vol. 99, no. July 2017, pp. 1–8, 2018, https://doi.org/10.1016/j.ejrad.2017.12.004 .
      Sharma R, Mahanti GK, Panda G, Rath A, Dash S, Mallik S, Hu R, Algorithms. J Imaging. 2023;9:173. https://doi.org/10.3390/jimaging9090173 . (PMID: 10.3390/jimaging90901733775493710532397)
      Mourad M, et al. Sci Rep. 2020;10(1):1–11. https://doi.org/10.1038/s41598-020-62023-w . Machine Learning and Feature Selection Applied to SEER Data to Reliably Assess Thyroid Cancer Prognosis.
      Stenman S, Bétrisey S, Vainio P, Huvila J, Lundin M, Linder N, Schmitt A, Perren A, Dettmer MS, Haglund C, Arola J, Lundin J. External validation of a deep learning-based algorithm for detection of tall cells in papillary thyroid carcinoma: a multicenter study. J Pathol Inf. 2024;15:100366. https://doi.org/10.1016/j.jpi.2024.100366 . (PMID: 10.1016/j.jpi.2024.100366)
      Pozdeyev N, Dighe M, Barrio M, Raeburn C, Smith H, Fisher M, Chavan S, Rafaels N, Shortt JA, Lin M, Leu MG, Clark T, Marshall C, Haugen BR, Subramanian D, Crooks K, Gignoux C, Cohen T. Thyroid cancer polygenic risk score improves classification of thyroid nodules as benign or malignant. J Clin Endocrinol Metab. 2024;109(2):402–12. https://doi.org/10.1210/clinem/dgad530 . (PMID: 10.1210/clinem/dgad53037683082)
      Taylor JN, et al. High-resolution Raman Microscopic detection of follicular thyroid Cancer cells with unsupervised machine learning. J Phys Chem B. 2019;123:4358–72. https://doi.org/10.1021/acs.jpcb.9b01159 . (PMID: 10.1021/acs.jpcb.9b0115931035762)
      du Plessis L, Škunca N, Dessimoz C. The what, where, how and why of gene ontology-A primer for bioinformaticians. Brief Bioinform. 2011;12(6):723–35. https://doi.org/10.1093/bib/bbr002 . (PMID: 10.1093/bib/bbr002213303313220872)
      Shen Y et al. Identification of Potential Biomarkers for Thyroid Cancer Using Bioinformatics Strategy: A Study Based on GEO Datasets, Biomed Res. Int., vol. 2020, 2020, https://doi.org/10.1155/2020/9710421 .
      Hasin Y, Seldin M, Lusis A. Multi-omics approaches to disease. Genome Biol. 2017;18(1):1–15. https://doi.org/10.1186/s13059-017-1215-1 . (PMID: 10.1186/s13059-017-1215-1)
      Ren H, Liu X, Li F, He X, Zhao N. Identification of a six gene prognosis signature for papillary thyroid Cancer using Multi-omics methods and Bioinformatics Analysis. Front Oncol. March, 2021;11. https://doi.org/10.3389/fonc.2021.624421 .
      Liang W, Sun F. Identification of key genes of papillary thyroid cancer using integrated bioinformatics analysis. J Endocrinol Invest. 2018;41(10):1237–45. https://doi.org/10.1007/s40618-018-0859-3 . (PMID: 10.1007/s40618-018-0859-329520684)
      Du J, et al. A decision analysis model for KEGG pathway analysis. BMC Bioinformatics. 2016;17(1):1–13. https://doi.org/10.1186/s12859-016-1285-1 . (PMID: 10.1186/s12859-016-1285-1)
      Yi Y, Fang Y, Wu K, Liu Y, Zhang W. Comprehensive gene and pathway analysis of cervical cancer progression. Oncol Lett. 2020;19(4):3316–32. https://doi.org/10.3892/ol.2020.11439 . (PMID: 10.3892/ol.2020.11439322568267074609)
      Andreopoulos B. Protein–protein Interaction Networks. Encycl Astrobiol. 2011;no January 2013:1348–1348. https://doi.org/10.1007/978-3-642-11274-4_2999 . (PMID: 10.1007/978-3-642-11274-4_2999)
      Sorrenti S, Dolcetti V, Radzina M, Bellini MI, Frezza F, Munir K, Grani G, Durante C, D'Andrea V, David E, Calò PG, Lori E, Cantisani V. Artificial Intelligence for Thyroid Nodule Characterization: Where Are We Standing? Cancers (Basel). 2022;14(14):3357. https://doi.org/10.3390/cancers14143357 .
      Zhu Y-C, Jin P-F, Bao J, Jiang Q, Wang X. Thyroid ultrasound image classification using a convolutional neural network. Ann Transl Med. 2021;9(20):1526–1526. https://doi.org/10.21037/atm-21-4328 . (PMID: 10.21037/atm-21-4328347907328576712)
      Peng S, et al. Deep learning-based artificial intelligence model to assist thyroid nodule diagnosis and management: a multicentre diagnostic study. Lancet Digit Heal. 2021;3(4):e250–9. https://doi.org/10.1016/S2589-7500(21)00041-8 . (PMID: 10.1016/S2589-7500(21)00041-8)
      Shah AA, Malik HAM, Mohammad AH, Khan YD, Alourani A. Machine learning techniques for identification of carcinogenic mutations, which cause breast adenocarcinoma. Sci Rep. 2022;12(1):1–15. https://doi.org/10.1038/s41598-022-15533-8 . (PMID: 10.1038/s41598-022-15533-8)
      Shah A, Ali F, Alturise T, Alkhalifah, Yaser Daanial Khan. Deep learning approaches for detection of breast adenocarcinoma causing carcinogenic mutations. Int J Mol Sci. 2022;23:19: 11539. https://doi.org/10.3390/ijms231911539 . (PMID: 10.3390/ijms231911539362328409570286)
      Shah AA, Alturise F, Alkhalifah T, Khan YD. Evaluation of deep learning techniques for identification of sarcoma-causing carcinogenic mutations. Digit HEALTH. 2022;8. https://doi.org/10.1177/20552076221133703 .
      Ali F, Kumar H, Patil S, Ahmed A, Banjar A, Daud A. DBP-DeepCNN: prediction of DNA-binding proteins using wavelet-based denoising and deep learning. Chemometr Intell Lab Syst. 2022;229:104639. https://doi.org/10.1016/j.chemolab.2022.104639 . (PMID: 10.1016/j.chemolab.2022.104639)
      Ali F, Kumar H, Patil S, Ahmad A, Babour A, Daud A. Deep-GHBP: improving prediction of growth hormone-binding proteins using deep learning model. Biomed Signal Process Control. 2022;78:103856. https://doi.org/10.1016/j.bspc.2022.103856 . (PMID: 10.1016/j.bspc.2022.103856)
      Shah AA, Malik HAM, Muhammad A, et al. Deep learning ensemble 2D CNN approach towards the detection of lung cancer. Sci Rep. 2023;13:2987. https://doi.org/10.1038/s41598-023-29656-z . (PMID: 10.1038/s41598-023-29656-z368075769941084)
      IntOGen - Cancer Mutations Browser. https://intogen.org/search (Accessed 16 Sep 2022).
      Ensembl genome browser 107. https://asia.ensembl.org/index.html (Accessed 16 Sep 2022).
      Salman H, Grover J, Shankar T. Hierarchical Reinforcement Learning for Sequencing Behaviors, vol. 2733, no. March, pp. 2709–2733, 2018, https://doi.org/10.1162/NECO .
      Dey R, M Salemt F. Gate-variants of gated recurrent unit (GRU) neural networks. Midwest Symp Circuits Syst. 2017;2017–Augus(2):1597–600. https://doi.org/10.1109/MWSCAS.2017.8053243 . (PMID: 10.1109/MWSCAS.2017.8053243)
      Graves A, Jaitly N, Mohamed AR. Hybrid speech recognition with Deep Bidirectional LSTM. 2013 IEEE Workshop on Automatic Speech Recognition and Understanding. Olomouc: IEEE; 2013. p. 273–8. https://doi.org/10.1109/ASRU.2013.6707742 .
      Li Y, Yu X, Koudas N. Data acquisition for improving machine learning models, Proc. VLDB Endow., vol. 14, no. 10, pp. 1832–1844, 2021, https://doi.org/10.14778/3467861.3467872 .
      Saurkar AV, Gode SA, An Overview On Web Scraping Techniques And Tools., Int. J. Futur. Revolut. Comput. Sci. Commun. Eng., vol. 4, no. 4, pp. 363–367, 2018, [Online]. Available: http://www.ijfrcsce.org/index.php/ijfrcsce/article/view/1529 .
      Ji X, et al. Distinguishing between cancer driver and passenger gene alteration candidates via cross-species comparison: a pilot study. BMC Cancer. 2010;10. https://doi.org/10.1186/1471-2407-10-426 .
      Ari N, Ustazhanov M. Matplotlib in python, Proc. 11th Int. Conf. Electron. Comput. Comput. ICECCO 2014, 2014, https://doi.org/10.1109/ICECCO.2014.6997585 .
      Khalid S, Khalil T, Nasreen S. A survey of feature selection and feature extraction techniques in machine learning. Proc 2014 Sci Inf Conf SAI 2014. 2014;no July:372–8. https://doi.org/10.1109/SAI.2014.6918213 . (PMID: 10.1109/SAI.2014.6918213)
      Guyon I, Gunn S, Nikravesh M, Zadeh LA, editors. Feature extraction: foundations and applications, vol. 207. Springer; 2008.
      Malebary SJ, Khan YD. Evaluating machine learning methodologies for identification of cancer driver genes. Sci Rep. 2021;11(1):1–14. https://doi.org/10.1038/s41598-021-91656-8 . (PMID: 10.1038/s41598-021-91656-8)
      Malebary SJ, Khan R, Khan YD. ProtoPred: advancing Oncological Research through Identification of Proto-Oncogene proteins. IEEE Access. 2021;9:68788–97. https://doi.org/10.1109/ACCESS.2021.3076448 . (PMID: 10.1109/ACCESS.2021.3076448)
      Sohail MU, Shabbir J, Sohil F. Imputation of missing values by using raw moments. Stat Transit. 2019;20(1):21–40. https://doi.org/10.21307/stattrans-2019-002 . (PMID: 10.21307/stattrans-2019-002)
      Butt AH, Alkhalaf S, Iqbal S, Khan YD. EnhancerP-2L: a Gene regulatory site identification tool for DNA enhancer region using CREs motifs. bioRxiv. 2020. https://doi.org/10.1101/2020.01.20.912451 . (PMID: 10.1101/2020.01.20.912451)
      Butt AH, Khan YD. CanLect-Pred: a cancer therapeutics tool for prediction of target cancerlectins using experiential annotated proteomic sequences. IEEE Access. 2020;8:9520–31. https://doi.org/10.1109/ACCESS.2019.2962002 . (PMID: 10.1109/ACCESS.2019.2962002)
      Akmal MA, Hussain W, Rasool N, Khan YD, Khan SA, Chou K-C. Using Chou’s 5-steps rule to predict O-linked serine glycosylation sites by blending position relative features and statistical moment, IEEE/ACM Trans. Comput. Biol. Bioinforma., vol. 5963, no. c, pp. 1–1, 2020, https://doi.org/10.1109/tcbb.2020.2968441 .
      Akmal MA, Rasool N, Khan YD. Prediction of N-linked glycosylation sites using position relative features and statistical moments. PLoS ONE. 2017;12(8):1–21. https://doi.org/10.1371/journal.pone.0181966 . (PMID: 10.1371/journal.pone.0181966)
      Shah AA, Khan YD. Identification of 4-carboxyglutamate residue sites based on position based statistical feature and multiple classification. Sci Rep. 2020;10(1):2–11. https://doi.org/10.1038/s41598-020-73107-y . (PMID: 10.1038/s41598-020-73107-y)
      Hussain W, Khan YD, Rasool N, Khan SA, Chou KC. SPalmitoylC-PseAAC: A sequence-based model developed via Chou’s 5-steps rule and general PseAAC for identifying S-palmitoylation sites in proteins, Anal. Biochem., vol. 568, no. December 2018, pp. 14–23, 2019, https://doi.org/10.1016/j.ab.2018.12.019 .
      Butt AH, Khan YD. Prediction of S-Sulfenylation sites using statistical moments based features via CHOU’s 5-Step rule. Int J Pept Res Ther. 2020;26(3):1291–301. https://doi.org/10.1007/s10989-019-09931-2 . (PMID: 10.1007/s10989-019-09931-2)
      Sundermeyer M, Schlüter R, Ney H. LSTM neural networks for language processing, Interspeech 2012, pp. 194–197, 2012, [Online]. Available: https://www.isca-speech.org/archive/interspeech_2012/i12_0194.html .
      Wahab Khan A, Daud K, Khan S, Muhammad R, Haq. Exploring the frontiers of deep learning and natural language processing: a comprehensive overview of key challenges and emerging trends. Nat Lang Process J. 2023;4:100026. https://doi.org/10.1016/j.nlp.2023.100026 . (PMID: 10.1016/j.nlp.2023.100026)
      Kazi S, Khoja S, Daud A. A survey of deep learning techniques for machine reading comprehension. Artif Intell Rev. 2023;56(2):2509–69. https://doi.org/10.1007/s10462-023-10583-4 . (PMID: 10.1007/s10462-023-10583-4)
      Hayat MK, et al. Towards deep learning prospects: insights for Social Media Analytics. IEEE Access. 2019;7:36958–79. https://doi.org/10.1109/ACCESS.2019.2905101 . (PMID: 10.1109/ACCESS.2019.2905101)
      Talaat FM, El-Sappagh S, Alnowaiser K, et al. Improved prostate cancer diagnosis using a modified ResNet50-based deep learning architecture. BMC Med Inf Decis Mak. 2024;24:23. https://doi.org/10.1186/s12911-024-02419-0 . (PMID: 10.1186/s12911-024-02419-0)
      Chen C, Chen C, Ma M, et al. Classification of multi-differentiated liver cancer pathological images based on deep learning attention mechanism. BMC Med Inf Decis Mak. 2022;22:176. https://doi.org/10.1186/s12911-022-01919-1 . (PMID: 10.1186/s12911-022-01919-1)
      Gomes HM, Barddal JP, Enembreck AF, Bifet A. A survey on ensemble learning for data stream classification. ACM Comput Surv. 2017;50(2). https://doi.org/10.1145/3054925 .
      Sagi O, Rokach L. Ensemble learning: a survey. Wiley Interdiscip Rev Data Min Knowl Discov. 2018;8(4):1–18. https://doi.org/10.1002/widm.1249 . (PMID: 10.1002/widm.1249)
      Shah AA, Alturise F, Alkhalifah T, Faisal A, Khan YD. EDLM: Ensemble deep learning model to detect mutation for the early detection of Cholangiocarcinoma, Genes, vol. 14, no. 5, p. 1104, 2023. https://doi.org/10.3390/genes14051104 .
      Chicco D, Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics. 2020;21(1):1–13. https://doi.org/10.1186/s12864-019-6413-7 . (PMID: 10.1186/s12864-019-6413-7)
      Shah AA, Shaker ASA, Jabbar S, et al. An ensemble-based deep learning model for detection of mutation causing cutaneous melanoma. Sci Rep. 2023;13:22251. https://doi.org/10.1038/s41598-023-49075-4 . (PMID: 10.1038/s41598-023-49075-4)
      Asfand-e-yar M, Hashir Q, Shah AA, Malik HA, Alourani A, Khalil W, Multimodal CNN-DDI: Using Multimodal CNN for Drug to Drug Interaction Associated Events, Scientific Reports, vol. 14, no. 1, pp. 1–10, 2024.M. Sokolova, N., Japkowicz S, Szpakowicz. Beyond accuracy, F-score and ROC: A family of discriminant measures for performance evaluation, AAAI Work. - Tech. Rep., vol. WS-06-06, pp. 24–29, 2006, https://doi.org/10.1007/11941439_114 .
    • Contributed Indexing:
      Keywords: Bi-directional LSTM (Bi-LSTM); Deep learning; Ensemble learning model (ELM); Gated recurrent units (GRUs); Long short-term memory (LSTM); Mutation detection; Thyroid Cancer
    • الموضوع:
      Date Created: 20240722 Date Completed: 20240722 Latest Revision: 20240722
    • الموضوع:
      20240723
    • الرقم المعرف:
      10.1186/s12911-024-02604-1
    • الرقم المعرف:
      39039464