نبذة مختصرة : *Background* Massively parallel DNA sequencing technologies provide exponential increases in the amount of data returned from the sequencing process, relative to traditional (Sanger-based) techniques. Use of unique, synthetic oligonucleotides (identifying sequence tags) on each sample enables the deconvolution of samples pooled prior to massively parallel sequencing. To counteract oligonucleotide synthesis and sequencing errors, sequence tags should be drawn from combinations with sufficient differences and with an appropriate error-correcting code over the alphabet [A,C,G,T]. This method ensures that errors within the tags do not cause sequences to be assigned to the wrong sample while also enabling correction and recovery of incorrectly-sequenced or incorrectly-synthesized tags. The set of available tags should be large, allowing sample multiplexing to scale with rapid changes in sequencing platform output. The set of tags should also account for errors possible during both the oligonucleotide synthesis and sequencing processes. Most tags in current use have been designed to maintain a particular Hamming distance: a scheme that ensures the distance between tag sequences is maintained in the presence of substitutions. However, sequence identification tags conforming to the edit-metric are more appropriate: edit-metric sequence tags are robust to insertions, deletions, and substitution errors. *Results* We present edittag, a python package containing several tools to facilitate the design of edit-metric-based sequence identification tags, check existing sets of sequence identification tags for conformance to the edit-metric, and apply sequence identification tags to primers and/or adapters. We use edittag to design several large sets of edit-metric sequence tags ranging from four to 10 nucleotides in length and edit distance three to nine. Finally, we test a set of fusion primers designed with the software developed here, demonstrating high ...
No Comments.