CStone: A de novo transcriptome assembler for short-read data that identifies non-chimeric contigs based on underlying graph structure

Item request has been placed!

Item request cannot be made.

Processing Request

اقرأ على الانترنت اقرأ أكثر حفظ في قائمتي

المؤلفون: John Archer; Raquel Linheiro
المصدر:
PLoS Computational Biology, Vol 17, Iss 11, p e1009631 (2021)
PLoS Computational Biology
الموضوع:
Canaries; Sequence assembly; Datasets as Topic; Gene Expression; Database and Informatics Methods; Biology (General); De Bruijn sequence; Mammals; Ecology; Contig; Drosophila Melanogaster; Simulation and Modeling; Eukaryota; Genomics; Animal Models; Insects; Computational Theory and Mathematics; Experimental Organism Systems; Modeling and Simulation; Vertebrates; Drosophila; Transcriptome Analysis; Sequence Analysis; Research Article; DNA, Complementary; Arthropoda; Bioinformatics; QH301-705.5; Sequence alignment; Computational biology; Biology; Research and Analysis Methods; Chimerism; Birds; Cellular and Molecular Neuroscience; Model Organisms; Complementary DNA; Genetics; Gene family; Animals; Molecular Biology; Ecology, Evolution, Behavior and Systematics; Leopards; cDNA library; Sequence Analysis, RNA; Organisms; Biology and Life Sciences; Computational Biology; Genome Analysis; Invertebrates; Amniotes; Animal Studies; Cats; Transcriptome; Zoology; Entomology; Sequence Alignment; Software
اللغة:
English
الدخول الالكتروني :
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::f697f594bc7fcee3f2f58325614671a6
https://doaj.org/article/ad8577de265f4eb691bc78de00cc473b

معلومة اضافية
- بيانات النشر:
  Public Library of Science (PLoS), 2021.
- الموضوع:
  2021
- نبذة مختصرة :
  With the exponential growth of sequence information stored over the last decade, including that of de novo assembled contigs from RNA-Seq experiments, quantification of chimeric sequences has become essential when assembling read data. In transcriptomics, de novo assembled chimeras can closely resemble underlying transcripts, but patterns such as those seen between co-evolving sites, or mapped read counts, become obscured. We have created a de Bruijn based de novo assembler for RNA-Seq data that utilizes a classification system to describe the complexity of underlying graphs from which contigs are created. Each contig is labelled with one of three levels, indicating whether or not ambiguous paths exist. A by-product of this is information on the range of complexity of the underlying gene families present. As a demonstration of CStones ability to assemble high-quality contigs, and to label them in this manner, both simulated and real data were used. For simulated data, ten million read pairs were generated from cDNA libraries representing four species, Drosophila melanogaster, Panthera pardus, Rattus norvegicus and Serinus canaria. These were assembled using CStone, Trinity and rnaSPAdes; the latter two being high-quality, well established, de novo assembers. For real data, two RNA-Seq datasets, each consisting of ≈30 million read pairs, representing two adult D. melanogaster whole-body samples were used. The contigs that CStone produced were comparable in quality to those of Trinity and rnaSPAdes in terms of length, sequence identity of aligned regions and the range of cDNA transcripts represented, whilst providing additional information on chimerism. Here we describe the details of CStones assembly and classification process, and propose that similar classification systems can be incorporated into other de novo assembly tools. Within a related side study, we explore the effects that chimera’s within reference sets have on the identification of differentially expression genes. CStone is available at: https://sourceforge.net/projects/cstone/.
  Author summary Within transcriptome reference sets, non-chimeric sequences are representations of transcribed genes, while artificially generated chimeric ones are mosaics of two or more pieces of DNA incorrectly pieced together. One area where such sets are utilized is in the quantification of gene expression patterns; where RNA-Seq reads are mapped to the sequences within, and subsequent count values reflect expression levels. Artificial chimeras can have a negative impact on count values by erroneously increasing variation in relation to the reads being mapped. Reference sets can be created from de novo assembled contigs, but chimeras can be introduced during the assembly process via the required traversal of graphs, representing gene families, constructed from the RNA-Seq data. Graph complexity determines how likely chimeras will arise. We have created CStone, a de novo assembler that utilizes a classification system to describe such complexity. Contigs created by CStone are labelled in a manner that indicates whether or not they are non-chimeric. This encourages contig dependent results to be presented with increased objectivity by maintaining the context of ambiguity associated with the assembly process. CStone has been tested extensively. Additionally, we have quantified the relationship between chimeras within reference sets and the identification of differentially expressed genes.
- ISSN:
  1553-7358
- Rights:
  OPEN
- الرقم المعرف:
  edsair.doi.dedup.....f697f594bc7fcee3f2f58325614671a6

تعليقات

No Comments.

CStone: A de novo transcriptome assembler for short-read data that identifies non-chimeric contigs based on underlying graph structure

اتصل بنا

اتبع