نبذة مختصرة : The number of available completely sequenced genomes has grown exponentially in the last two decades. Today, the total number of DNA sequences stored in public databases doubles about every 18 months, a development fuelled by continuous improvements in DNA sequencing technologies. Next-generation sequencing (NGS) has caused a dramatic drop in sequencing costs that not only propelled the growth of available DNA sequences in public databases but also has encouraged the establishment of metagenomic community-sequencing approaches. Full genome sequencing is restricted to cultivable strains, considering that only a minor fraction of the microbial species in a given habitat can be cultivated with current techniques, metagenomics, the sequencing of DNA from an environmental sample, is the method of choice. With the huge amount of data that has to be processed in metagenomic projects, new challenges arise, especially addressing metagenomics classic problem of binning and classification. For example the direct sequencing of microbial communities, using NGS technologies, often yields longer assemblies of the abundant species and a wealth of sequences that have to be taxonomically clustered into bins (taxobins), same applies to standard Sanger sequencing. This approach requires methods that allow to taxonomically classify- ing sequences with reasonable accuracy. Binning names the process of clustering metagenomic sequences according to certain features and parameters, while classification terms the assignment of metagenomic sequences to known organisms and taxonomic groups. The aim of this thesis was to aid in the analysis of metagenomic data concerning the classification and binning task in metagenomic projects. First the technology to perform binning and classification was set up by implementing a software capable of performing taxonomic classification and binning of metagenomic sequence fragments pursuing the aim to make this software ready to deal with the amount of sequence data present in today's public databases. A ...
No Comments.