Measurable Shapelets Extraction Based on Symbolic Rrepresentation for Time Series Classification
In the time series classification problems,shapelets extraction method based on symbol representation has good classifi-cation accuracy and efficiency,but the quality measurement of symbols,such as calculating TFIDF scores,is time-consuming and computatively heavy,leading to low classification efficiency.In addition,there are still a large number of shapelets candidates ex-tracted,and the discriminating power needs to be improved.To solve these problems,this paper proposes a measurable shapelets extraction method based on symbolic representation,which includes three stages:time series data preprocessing,determining shapelets candidate set and learning shapelets,so that high-quality shapelets can be obtained quickly.In the data preprocessing stage,the time series is transformed into a symbolic aggregation approximation(SAX)representation to reduce the dimensions of the original time series.In the stage of determining the candidate set of shapelets,Bloom filters are used to filter repeated SAX words,and the filtered SAX words are stored in the hash table for quality measurement.Then,the similarity of SAX words is dis-criminated,and the final shapelets candidate set is determined based on the concepts of similarity and coverage.In the learning phase of shapelets,the logistic regression model is used to learn real shapelets for time series classification.In this paper,a large number of experiments are conducted on 32 datasets,and the experimental results show that the average classification accuracy and average classification efficiency of the proposed method rank second on 32 datasets.Compared with the existing time series classification methods based on shapelets,the proposed method can improve the classification efficiency while ensuring the accura-cy,and has good interpretability.
Time series classificationShapeletSAX meansBloom filtersLogistic regression