Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

Comparing Elastic-Degenerate Strings: Algorithms, Lower Bounds, and Applications

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • معلومة اضافية
    • Contributors:
      Centrum voor Wiskunde en Informatica (CWI); Centrum Wiskunde & Informatica (CWI)-Netherlands Organisation for Scientific Research; University of Pisa - Università di Pisa; Equipe de recherche européenne en algorithmique et biologie formelle et expérimentale (ERABLE); Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 (LBBE); Université Claude Bernard Lyon 1 (UCBL); Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL); Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Inria Lyon; Institut National de Recherche en Informatique et en Automatique (Inria); Vrije Universiteit Amsterdam Amsterdam (VU); Institute of Informatics, University of Warsaw, Banacha 2, 02-097 Warsaw, Poland.; Institute of Informatics; Université de Varsovie-Université de Varsovie
    • بيانات النشر:
      HAL CCSD
    • الموضوع:
      2023
    • Collection:
      Université de Lyon: HAL
    • الموضوع:
    • نبذة مختصرة :
      International audience ; An elastic-degenerate (ED) string T is a sequence of n sets T [1],. . , T [n] containing m strings in total whose cumulative length is N. We call n, m, and N the length, the cardinality and the size of T , respectively. The language of T is defined as L(T) = {S1 • • • Sn : Si ∈ T [i] for all i ∈ [1, n]}. ED strings have been introduced to represent a set of closely-related DNA sequences, also known as a pangenome. The basic question we investigate here is: Given two ED strings, how fast can we check whether the two languages they represent have a nonempty intersection? We call the underlying problem the ED String Intersection (EDSI) problem. For two ED strings T1 and T2 of lengths n1 and n2, cardinalities m1 and m2, and sizes N1 and N2, respectively, we show the following: There is no O((N1N2) 1-ϵ)-time algorithm, thus no O (N1m2 + N2m1) 1-ϵ-time algorithm and no O (N1n2 + N2n1) 1-ϵ-time algorithm, for any constant ϵ > 0, for EDSI even when T1 and T2 are over a binary alphabet, unless the Strong Exponential-Time Hypothesis is false. There is no combinatorial O((N1 + N2) 1.2-ϵ f (n1, n2))-time algorithm, for any constant ϵ > 0 and any function f , for EDSI even when T1 and T2 are over a binary alphabet, unless the Boolean Matrix Multiplication conjecture is false. An O(N1 log N1 log n1 + N2 log N2 log n2)-time algorithm for outputting a compact (RLE) representation of the intersection language of two unary ED strings. In the case when T1 and T2 are given in a compact representation, we show that the problem is NP-complete. An O(N1m2 + N2m1)-time algorithm for EDSI. An Õ(N ω-1 1 n2 + N ω-1 2 n1)-time algorithm for EDSI, where ω is the exponent of matrix multiplication; the Õ notation suppresses factors that are polylogarithmic in the input size. We also show that the techniques we develop have applications outside of ED string comparison.
    • Relation:
      hal-04365687; https://inria.hal.science/hal-04365687; https://inria.hal.science/hal-04365687/document; https://inria.hal.science/hal-04365687/file/LIPIcs.CPM.2023.11.pdf
    • الرقم المعرف:
      10.4230/LIPIcs.CPM.2023.11
    • Rights:
      http://creativecommons.org/licenses/by/ ; info:eu-repo/semantics/OpenAccess
    • الرقم المعرف:
      edsbas.8FF1F908