Inverted files versus suffix arrays for locating patterns in primary memory

Puglisi, Simon; Smyth, William; Turpin, Andrew

doi:10.1007/11880561_11

Access Status

Fulltext not available

Authors

Puglisi, Simon

Smyth, William

Turpin, Andrew

Date

2006

Type

Conference Paper

Metadata

Show full item record

Citation

Puglisi, Simon J. and Smyth, W.F. and Turpin, Andrew. 2006. Inverted files versus suffix arrays for locating patterns in primary memory, in Fabio Crestani (ed), 13th Symposium on String Processing and Information Retrieval (SPIRE), Oct 11 2006, pp. 122-133. Glasgow, UK: Springer.

Source Title

Inverted files versus suffix arrays for locating patterns in primary memory

Source Conference

13th Symposium on String Processing and Information Retrieval (SPIRE)

DOI

10.1007/11880561_11

ISBN

978-3-540-45774-9

Faculty

Curtin Business School

The Digital Ecosystems and Business Intelligence Institute (DEBII)

School

Other

Remarks

The original publication is available at : http://www.springerlink.com

URI

http://hdl.handle.net/20.500.11937/43428

Collection

Curtin Research Publications

Abstract

Recent advances in the asymptotic resource costs of pattern matching with compressed suffix arrays are attractive, but a key rival structure, the compressed inverted file, has been dismissed or ignored in papers presenting the new structures. In this paper we examine the resource requirements of compressed suffix array algorithms against compressed inverted file data structures for general pattern matching in genomic and English texts. In both cases, the inverted file indexes q-grams, thus allowing full pattern matching capabilities, rather than simple word based search, making their functionality equivalent to the compressed suffix array structures. When using equivalent memory for the two structures, inverted files are faster at reporting the location of patterns when the number of occurrences of the patterns is high.