Lempel-Ziv factorization using less time & space
Access Status
Authors
Date
2008Type
Metadata
Show full item recordCitation
Source Title
ISSN
Faculty
School
Collection
Abstract
For 30 years the Lempel-Ziv factorization LZ x of a string x = x[1..n] has been a fundamental data structure of string processing, especially valuable for string compression and for computing all the repetitions (runs) in x. Traditionally the standard method for computing LZ x was based on T(n)-time (or, depending on the measure used, O(n log n)-time) processing of the suffix tree ST x of x. Recently Abouelhoda et al. proposed an efficient Lempel-Ziv factorization algorithm based on an 'enhanced' suffix array - that is, a suffix array SA x together with supporting data structures, principally an 'interval tree'. In this paper we introduce a collection of fast space-efficient algorithms for LZ factorization, also based on suffix arrays, that in theory as well as in many practical circumstances are superior to those previously proposed; one family out of this collection achieves true T(n)-time alphabet-independent processing in the worst case by avoiding tree structures altogether.
Related items
Showing items related by title, author, creator and subject.
-
Puglisi, Simon; Smyth, William; Turpin, A. (2007)In 1990, Manber and Myers proposed suffix arrays as a space-saving alternative to suffix trees and described the first algorithms for suffix array construction and use. Since that time, and especially in the last few ...
-
Antonitio, A.; Ryan, P.; Smyth, Bill; Turpin, A.; Yu, X. (2004)In 2003 three (-)(n)-time algorithms were proposed for the construction of a suffix array of a string x = x[1..n] on an indexed alphabet, all of them inspired by the methodology of Farach's (-)(n)- time suffix tree ...
-
Puglisi, Simon; Smyth, William; Yusufu, M. (2010)Given a string x = x[1..n] on an alphabet of size α, and a threshold p min ≥ 1, we describe four variants of an algorithm PSY1 that, using a suffix array, computes all the complete nonextendible repeats in x of length p ...