A Classification Method Based on Centroid and Ontology
Aimed at the unsuitability of traditional TFIDF model for calculating the feature weight of a document in a root set,a new model named TFIDF-2 is proposed in this paper.In addition,three heuristic rules are given for obtaining a centroid vector corresponding to a root set.A document can be classified if its similarity with centroids is calculated.This is just its preliminary application concerning centroid.During this process,a new method for calculating the similarity between document and centroid is proposed.Through a series of experiments,it is verified that this classification method is more accurate and more efficient than traditional classification methods.Finally this paper validates the effectivity of combining ontology with centroid for extracting relevant documents from an unlabeled dataset.