Iterative de Bruijn graph assemblers for second-generation sequencing reads

Item request has been placed!

Item request cannot be made.

Processing Request

اقرأ أكثر حفظ في قائمتي

المؤلفون: Peng, Yu; 彭煜
المصدر:
http://hub.hku.hk/bib/B50534051.
الموضوع:
Nucleotide sequence - Data processing; Sequence alignment (Bioinformatics)
نوع التسجيلة:
doctoral or postdoctoral thesis
اللغة:
English

معلومة اضافية
- Contributors:
  Chin, FYL
- بيانات النشر:
  The University of Hong Kong (Pokfulam, Hong Kong)
- الموضوع:
  2013
- Collection:
  University of Hong Kong: HKU Scholars Hub
- نبذة مختصرة :
  The recent advance of second-generation sequencing technologies has made it possible to generate a vast amount of short read sequences from a DNA (cDNA) sample. Current short read assemblers make use of the de Bruijn graph, in which each vertex is a k-mer and each edge connecting vertex u and vertex v represents u and v appearing in a read consecutively, to produce contigs. There are three major problems for de Bruijn graph assemblers: (1) branch problem, due to errors and repeats; (2) gap problem, due to low or uneven sequencing depth; and (3) error problem, due to sequencing errors. A proper choice of k value is a crucial tradeoff in de Bruijn graph assemblers: a low k value leads to fewer gaps but more branches; a high k value leads to fewer branches but more gaps. In this thesis, I first analyze the fundamental genome assembly problem and then propose an iterative de Bruijn graph assembler (IDBA), which iterates from low to high k values, to construct a de Bruijn graph with fewer branches and fewer gaps than any other de Bruijn graph assembler using a fixed k value. Then, the second-generation sequencing data from metagenomic, single-cell and transcriptome samples is investigated. IDBA is then tailored with special treatments to handle the specific issues for each kind of data. For metagenomic sequencing data, a graph partition algorithm is proposed to separate de Bruijn graph into dense components, which represent similar regions in subspecies from the same species, and multiple sequence alignment is used to produce consensus of each component. For sequencing data with highly uneven depth such as single-cell and metagenomic sequencing data, a method called local assembly is designed to reconstruct missing k-mers in low-depth regions. Then, based on the observation that short and relatively low-depth contigs are more likely erroneous, progressive depth on contigs is used to remove errors in both low-depth and high-depth regions iteratively. For transcriptome sequencing data, a variant of the progressive ...
- Relation:
  HKU Theses Online (HKUTO); Peng, Y. [彭煜]. (2012). Iterative de Bruijn graph assemblers for second-generation sequencing reads. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5053405; b5053405; http://hdl.handle.net/10722/188286
- الرقم المعرف:
  10.5353/th_b5053405
- الدخول الالكتروني :
  https://doi.org/10.5353/th_b5053405
  http://hdl.handle.net/10722/188286
- Rights:
  This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. ; The author retains all proprietary rights, (such as patent rights) and the right to use in future works.
- الرقم المعرف:
  edsbas.924486B3

تعليقات

No Comments.

Iterative de Bruijn graph assemblers for second-generation sequencing reads

اتصل بنا

اتبع