Semi-supervised Classification of Data Stream with Concept Drift Based on Clustering Model Reuse
Semi-supervised classification of data stream with concept drift poses challenges to classifier training,classifier adap-tion for new concept,and concept drifting detection,for only some or even very few instances are labeled.In the existing semi-su-pervised clustering classification algorithms,only the clustering model in the classifier pool is updated incrementally,and the his-torical clustering model cannot be reused effectively.Therefore,this paper proposes a new cluster-based model reuse semi-super-vised classification algorithm,CDCMR.First,the data stream comes in the form of data chunks.After classifying the data chunks,a clustering model with adaptive determination of the number of clusters is trained.Secondly,multiple history classifiers are selected by calculating the similarity between each history classifier in the classifier pool and the clustering model.Thirdly,the selected history classifier is reused with the current data chunk and integrated with the cluster model.Then,the classifier pool is divided into old and new replacement and diversity maximization classifier pool for updating.Finally,the samples of the next data chunk are ensemble classification.Experimental results on several artificial and real data sets show that the algorithm can effec-tively adapt to concept drift,which is significantly improved compared with the existing methods.
Data streamSemi-supervised learningConcept driftClustering model reuseEnsemble learning