- Turkish Journal of Electrical Engineering and Computer Science
- Volume:24 Issue:5
- A new compression algorithm for fast text search
A new compression algorithm for fast text search
Authors : Aydin CARUS, Altan MESUT
Pages : 4355-4367
View : 12 | Download : 8
Publication Date : 0000-00-00
Article Type : Research Paper
Abstract :We propose a new compression algorithm that compresses plain texts by using a dictionary-based model and a compressed string-matching approach that can be used with the compressed texts produced by this algorithm. The compression algorithm insert ignore into journalissuearticles values(CAFTS); can reduce the size of the texts to approximately 41% of their original sizes. The presented compressed string matching approach insert ignore into journalissuearticles values(SoCAFTS);, which can be used with any of the known pattern matching algorithms, is compared with a powerful compressed string matching algorithm insert ignore into journalissuearticles values(ETDC); and a compressed string-matching tool insert ignore into journalissuearticles values(Lzgrep);. Although the search speed of ETDC is very good in short patterns, it can only search for exact words and its compression performance differs from one natural language to another because of its word-based structure. Our experimental results show that SoCAFTS is a good solution when it is necessary to search for long patterns in a compressed document.Keywords : Compressed string matching, text compression, dictionary based compression, exact pattern matching, CAFTS