Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

Efficient and robust search of microbial genomes via phylogenetic compression

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • معلومة اضافية
    • Contributors:
      Scalable, Optimized and Parallel Algorithms for Genomics (GenScale); Centre Inria de l'Université de Rennes; Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-GESTION DES DONNÉES ET DE LA CONNAISSANCE (IRISA-D7); Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA); Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes); Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique); Institut Mines-Télécom Paris (IMT)-Institut Mines-Télécom Paris (IMT)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes); Institut Mines-Télécom Paris (IMT)-Institut Mines-Télécom Paris (IMT)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA); Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique); Institut Mines-Télécom Paris (IMT)-Institut Mines-Télécom Paris (IMT); Department of Biomedical Informatics Harvard (DBMI); Harvard Medical School Boston (HMS); European Bioinformatics Institute Hinxton (EMBL-EBI); EMBL Heidelberg; Laboratoire d'Informatique Gaspard-Monge (LIGM); École nationale des ponts et chaussées (ENPC)-Centre National de la Recherche Scientifique (CNRS)-Université Gustave Eiffel; Algorithmes pour les séquences biologiques - Sequence Bioinformatics; Institut Pasteur Paris (IP)-Centre National de la Recherche Scientifique (CNRS)-Université Paris Cité (UPCité); This work was partially supported by the NIGMS of the National Institutes of Health (R35GM133700), the David and Lucile Packard Foundation, the Pew Charitable Trusts, and the Alfred P. Sloan Foundation.; ANR-19-P3IA-0001,PRAIRIE,PaRis Artificial Intelligence Research InstitutE(2019)
    • بيانات النشر:
      CCSD
      Nature Publishing Group
    • الموضوع:
      2025
    • Collection:
      École des Ponts ParisTech: HAL
    • نبذة مختصرة :
      International audience ; Comprehensive collections approaching millions of sequenced genomes have become central information sources in the life sciences. However, the rapid growth of these collections has made it effectively impossible to search these data using tools such as the Basic Local Alignment Search Tool (BLAST) and its successors. Here, we present a technique called phylogenetic compression, which uses evolutionary history to guide compression and efficiently search large collections of microbial genomes using existing algorithms and data structures. We show that, when applied to modern diverse collections approaching millions of genomes, lossless phylogenetic compression improves the compression ratios of assemblies, de Bruijn graphs and k-mer indexes by one to two orders of magnitude. Additionally, we develop a pipeline for a BLAST-like search over these phylogeny-compressed reference data, and demonstrate it can align genes, plasmids or entire sequencing experiments against all sequenced bacteria until 2019 on ordinary desktop computers within a few hours. Phylogenetic compression has broad applications in computational biology and may provide a fundamental design principle for future genomics infrastructure.
    • الرقم المعرف:
      10.1038/s41592-025-02625-2
    • الدخول الالكتروني :
      https://hal.science/hal-04287842
      https://hal.science/hal-04287842v3/document
      https://hal.science/hal-04287842v3/file/phylogenetic_compression.pdf
      https://doi.org/10.1038/s41592-025-02625-2
    • Rights:
      http://creativecommons.org/licenses/by-nc/ ; info:eu-repo/semantics/OpenAccess
    • الرقم المعرف:
      edsbas.D6900ECF