A Partition Inverted Index Compression Algorithm Based on CRF
The inverted index is the core data structure of a large search engine,which is essentially a collection of integer se-quences in an inverted table.Inverted index compression can effectively reduce the space occupied by inverted indexes and im-prove retrieval efficiency of keywords.The partition inverted index compression algorithm based on conditional random field(CRF)mainly focuses on the partition mode of universe partition.The algorithm pre-partitions the sequence and uses conditional random fields to label and reorganize the pre-partitions,which effectively reduces the compression time.According to the parti-tion type,the algorithm uses the corresponding encoding method to further reduce the space occupation after compression.Com-pared with other inverted index compression algorithms,this algorithm outperforms some current universe partition algorithms in compression ratio,and is equivalent to other universes partition algorithms in decompression time.The algorithm achieves a good balance between time and space.
inverted indexdata compressionuniverse partitionconditional random fieldsearch engines