Inverted files versus suffix arrays for locating patterns in primary memory
Access Status
Authors
Date
2006Type
Metadata
Show full item recordCitation
Source Title
Source Conference
ISBN
Faculty
School
Remarks
The original publication is available at : http://www.springerlink.com
Collection
Abstract
Recent advances in the asymptotic resource costs of pattern matching with compressed suffix arrays are attractive, but a key rival structure, the compressed inverted file, has been dismissed or ignored in papers presenting the new structures. In this paper we examine the resource requirements of compressed suffix array algorithms against compressed inverted file data structures for general pattern matching in genomic and English texts. In both cases, the inverted file indexes q-grams, thus allowing full pattern matching capabilities, rather than simple word based search, making their functionality equivalent to the compressed suffix array structures. When using equivalent memory for the two structures, inverted files are faster at reporting the location of patterns when the number of occurrences of the patterns is high.
Related items
Showing items related by title, author, creator and subject.
-
Puglisi, Simon; Smyth, Bill; Turpin, A. (2006)Recently the theoretical community has displayed a flurry of interest in suffix arrays, and compressed suffix arrays. New, asymptotically optimal algorithms for construction, search, and compression of suffix arrays have ...
-
Chen, G.; Puglisi, Simon; Smyth, B. (2008)For 30 years the Lempel-Ziv factorization LZ x of a string x = x[1..n] has been a fundamental data structure of string processing, especially valuable for string compression and for computing all the repetitions (runs) ...
-
Antonitio, A.; Ryan, P.; Smyth, Bill; Turpin, A.; Yu, X. (2004)In 2003 three (-)(n)-time algorithms were proposed for the construction of a suffix array of a string x = x[1..n] on an indexed alphabet, all of them inspired by the methodology of Farach's (-)(n)- time suffix tree ...