Pairwise Sequence Alignment using Bio-Database Compression by Improved Fine Tuned Enhanced Suffix Array.

Item request has been placed!

Item request cannot be made.

Processing Request

اقرأ على الانترنت اقرأ أكثر حفظ في قائمتي

المؤلفون: Kunthavai, Arumugam¹; Vasantharathna, Somasundaram²; Thirumurugan, Swaminathan¹
المصدر:
International Arab Journal of Information Technology (IAJIT). Jul2015, Vol. 12 Issue 4, p352-359. 8p.
الموضوع:
*SEQUENCE alignment; *DATABASES; *DATA compression; *SUFFIXES & prefixes (Grammar); *APPLICATION software

معلومة اضافية
- نبذة مختصرة :
  Sequence alignment is a bioinformatics application that determines the degree of similarity between nucleotide sequences which is assumed to have same ancestral relationships. This sequence alignment method reads query sequence from the user and makes an alignment against large and genomic sequence data sets and locate targets that are similar to an input query sequence. Existing accurate algorithm, such as smith-waterman and FASTA are computationally very expensive, which limits their use in practice. The existing search tools, such as BLAST and WU-BLAST, employ heuristics to improve the speed of such searches. However, such heuristics can sometimes miss targets, in which many cases are undesirable. Considering the rapid growth of database sizes, this problem demands ever-growing computation resources and remains as a computational challenge. Most common sequence alignment algorithms like BLAST, WU-BLAST and Sequance Comparasion Tool (SCT) searches a given query sequence against set of database sequences. In this paper, Biological Data Base Compression Tool using Minimum Perfect Hash Function (BioDBMPHF) tool has been developed to find pair wise local sequence alignment by preprocessing the database. Preprocessing is done by means of finding Longest Common Substring (LCS) from the database of sequences that have the highest local similarity with a given query sequence and reduces the size of the database based on frequent common subsequence. In this BioDBMPHF tool fine-tuned enhanced suffix array is constructed and used to find LCS. Experimental results show that hash index algorithm reduces the time and space complexity to access LCS. Time complexity to find LCS of the hash index algorithm is O(2+?) where '?' is the time taken to access the pattern. Space complexity of fine-tuned enhanced suffix array is 5n bytes per character for reduced enhanced Longest Common Prefix (LCP) table and to store bucket table it requires 32 bytes. Data mining technique is used to cross validate the result. It is proved that the developed BioDBMPHF tool effectively compresses the database and obtains same results compared to that traditional algorithm in approximately half the time taken by them thereby reducing the time complexity. [ABSTRACT FROM AUTHOR]
- نبذة مختصرة :
  Copyright of International Arab Journal of Information Technology (IAJIT) is the property of Colleges of Computing & Information Society and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)

تعليقات

No Comments.

Pairwise Sequence Alignment using Bio-Database Compression by Improved Fine Tuned Enhanced Suffix Array.

اتصل بنا

اتبع