Tibetan Enhanced Suffix Array Construction Algorithm Based on Induced Sorting
Suffix array,BWT array and LCP array are important data structures for full-text indexing and text compression.BWT array and LCP array are usually computed from the constructed suffix array.SAIS algorithm based on induced sorting is one of the fastest suffix array construction algo-rithms.This paper improves SAIS and proposes Tibetan suffix array algorithm:ITSBL algorithm,while inducing the generation of suffix array,computes BWT without storing a complete suffix array in memory,and processes the computed suffix array in combination with the characteristics of Tibet-an syllable structure to obtain Tibetan suffix array and LCP array in unit of Tibetan syllable word,and the results are more in line with the usage habits of Tibetan.Compared with the separate calcu-lation of suffix array,BWT,LCP array,the performance is improved by about 10%under large text and about 30%under small text,which has certain application value.