Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • معلومة اضافية
    • بيانات النشر:
      BMC Bioinformatics
    • الموضوع:
      2010
    • Collection:
      The University of Texas at Austin: Texas ScholarWorks
    • نبذة مختصرة :
      Vamsi K. Kundeti, Sanguthevar Rajasekaran, and Hieu Dinh are with the Department of Computer Science and Engineering, University of Connecticut,371 Fairfield Way, U-2155, Storrs, CT, 06269, USA. -- Matthew Vaughn is with the Texas Advanced Computing Center, University of Texas at Austin, TX, 78758, USA. -- Vishal Thapar is with Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Habor, NY, 11724, USA. ; Background: Assembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories - based on the data structures which they employ. The first class uses an overlap/string graph and the second type uses a de Bruijn graph. However with the recent advances in short read sequencing technology, de Bruijn graph based algorithms seem to play a vital role in practice. Efficient algorithms for building these massive de Bruijn graphs are very essential in large sequencing projects based on short reads. In an earlier work, an O(n/p) time parallel algorithm has been given for this problem. Here n is the size of the input and p is the number of processors. This algorithm enumerates all possible bi-directed edges which can overlap with a node and ends up generating Θ(nΣ) messages (Σ being the size of the alphabet). Results: In this paper we present a Θ(n/p) time parallel algorithm with a communication complexity that is equal to that of parallel sorting and is not sensitive to Σ. The generality of our algorithm makes it very easy to extend it even to the out-of-core model and in this case it has an optimal I/O complexity of Θ(nlog(n/B)Blog(M/B)) (M being the main memory size and B being the size of the disk block). We demonstrate the scalability of our parallel algorithm on a SGI/Altix computer. A comparison of our algorithm with the previous approaches reveals that our algorithm is faster - both asymptotically and practically. We demonstrate the scalability of our sequential out-of-core ...
    • File Description:
      application/pdf
    • Relation:
      Kundeti, Vamsi K., Sanguthevar Rajasekaran, Hieu Dinh, Matthew Vaughn, and Vishal Thapar. “Efficient Parallel and out of Core Algorithms for Constructing Large Bi-Directed de Bruijn Graphs.” BMC Bioinformatics 11, no. 1 (November 15, 2010): 560. doi:10.1186/1471-2105-11-560.; http://hdl.handle.net/2152/27814; 1471-2105-11-560
    • الرقم المعرف:
      10.1186/1471-2105-11-560
    • الدخول الالكتروني :
      https://doi.org/10.1186/1471-2105-11-560
      http://hdl.handle.net/2152/27814
    • Rights:
      Administrative deposit of works to UT Digital Repository: This works author(s) is or was a University faculty member, student or staff member; this article is already available through open access at http://www.biomedcentral.com. The public license is specified as CC-BY: http://creativecommons.org/licenses/by/4.0/. The library makes the deposit as a matter of fair use (for scholarly, educational, and research purposes), and to preserve the work and further secure public access to the works of the University.
    • الرقم المعرف:
      edsbas.C98DEF8F