Integrated Classification and Detection Method of Webpage Hidden Hyperlink Based on Differential Evolution
Hidden hyperlink,also known as black hyperlink,is a kind of link in website that is not easily detected by search engine.It disrupts search engine rankings and damages the web environment by embedding high-weight ex-ternal links.It has similarities with friendship links,although it can effectively and quickly incrase the PR value of website effectively,there is a certain level of risk in the website.Aiming at the current situation of redundant feature sets and curse of dimensionality in webpage hidden hyperlink detection methods,a machine learning webpage hidden hyperlink detection method based on fusion differential evolution algorithm integrated classifier is proposed.First,filter feature selection was performed on the extracted initial feature set,and then the features were extracted twice by principal component analysis.Finally,the four classifiers of decision tree,random forest,AdaBoost and support vector machine were integrated by differential evolution method.The experimental results show that the method has high ac-curacy and reliability,and the correct recognition rate reaches 99.8442368%,which can provide strong practical sup-port for search engines to detect hidden hyperlink behavior.