带学习过程的随机K最近邻算法

Random K-nearest neighbor algorithm with learning process

扫码查看

原文链接

维普
万方数据

中文摘要：针对传统K最近邻(KNN)算法没有学习过程,进行分类预测时需要遍历全部学习样本、时效性差且对k值敏感的缺点,本文提出了两种带学习过程的随机KNN算法(RKNN),包括对样本Bootstrap抽样的SRKNN算法和对样本特征Bootstrap抽样的ARKNN算法,均属于Bagging集成学习,学习多个简单KNN后投票输出结果.算法对样本的特征进行组合得到组合特征,简单KNN基于组合特征得到.重点研究了如何选取特征的最优组合系数,得到了取得最好分类精度时的特征最优组合系数选取规则和公式.RKNN算法在构造简单KNN时引入学习,分类时不再遍历全部学习样本而只需要用二分查找法即可,其分类时间复杂度比传统KNN算法分类时间复杂度低一个数量级.RKNN算法的分类精度比传统KNN算法的分类精度有大幅提升,解决了使用KNN算法难以选取k值的问题.理论分析和实验结果均验证了本文RKNN算法的有效性.

外文摘要：The traditional KNN(K-nearest neighbor)algorithm is a classic machine learning algorithm.This algorithm has no learning process and needs to traverse all the learning samples when classifying,and is time-sensitive and sensitive to the k value.This paper proposes two random KNN algorithms(RKNN)with a learning process,including the SRKNN algorithm on sample Bootstrap sampling and the ARKNN algorithm on sample feature Bootstrap sampling,both of which belong to Bagging ensemble learning.After learning multiple simple KNNs,the voting output results.The algorithm combines the features of the samples to obtain the combined features,and the simple KNN is obtained based on the combined features.It focuses on how to select the optimal combination coefficient of features,and obtains the selection rules and formulas of the optimal combination features for the best classification accuracy.The RKNN algorithm introduces learning when constructing a simple KNN,it no longer needs to traverse all the learning samples when classifying,but only needs to use the binary search method,and its classification time complexity is an order of magnitude lower than that of the traditional KNN algorithm.The classification accuracy of the RKNN algorithm is significantly improved than that of the traditional KNN algorithm.The RKNN algorithm solves the problem that it is difficult to select the k value using the KNN algorithm.Both theoretical analysis and experimental results show that the proposed RKNN algorithm is an efficient improvement to the KNN algorithm.

外文关键词：

machine learningK-nearest neighbor algorithmrandom K-Nearest neighborBagging ensemble learningAdaBoost

作者：

付忠良、陈晓清、任伟、姚宇

展开 >

作者单位：

中国科学院大学成都计算机应用研究所,成都 610299

关键词：

机器学习 KNN算法随机KNN Bagging集成学习 AdaBoost

基金：

国家自然科学基金四川省科技重大专项

项目编号：

61970101312021YFS0019

出版年：

2024

DOI：

10.13229/j.cnki.jdxbgxb.20220202

吉林大学学报(工学版)

吉林大学

吉林大学学报(工学版)

CSTPCD北大核心

影响因子：0.792

ISSN：1671-5497

年,卷(期)：2024.54(1)

参考文献量27