IntervalSketch:Approximate Statistical Method for Interval Items in Data Stream
The proportion of streaming databases is gradually increasing,and extracting the required information in the data streams of streaming databases is an important task.In this paper,we study interval items which refer to pairs of elements arri-ving with a fixed interval,and apply them to network scenarios.It is the first work to define and count interval items in data streams.To efficiently count the top-K interval items,IntervalSketch is proposed.IntervalSketch firstly chunks the data stream based on simulated annealing to accelerate the statistical speed,secondly,it uses Sketch to store the interval items,and lastly re-duces the memory of storing the interval items in Sketch through the feature grouping storage strategy,which enhances the accu-racy of counting the interval items.Extensive comparative experiments are carried out on two real datasets.Experimental results show that IntervalSketch significantly outperforms the baseline solution with the same memory,and the processing time is 1/3~1/2 of the baseline solution,the average absolute error and the average relative error are 1/3 of the baseline solution.