首页|一种基于集成学习的开源许可证检测与兼容性判断的方法

一种基于集成学习的开源许可证检测与兼容性判断的方法

扫码查看
软件供应链的安全性和可靠性对软件质量和演化有重要影响,而软件组件的许可证分析正是软件供应链中不可或缺的一环.开源许可证约束着开源软件的使用条件,以保护知识产权并维持开源软件的长远发展.为了避免法律风险与财产损失,识别开源软件许可证并判断开源许可证之间的兼容性至关重要.文中提出了基于集成学习的开源许可证的检测方法与依据兼容性的许可证推荐方法.具体来讲,提出了以基于大语言模型的集成学习为主,以规则匹配为辅的方法来进行开源许可证检测,并依据需求与有向图算法来完成许可证的兼容性判断与推荐.实验表明,相比于传统方法,该方法在更少的维护成本与高扩展性的优势下具有更好的检测效果,也能够有效地检测出兼容性并推荐结果.
Ensemble Learning Based Open Source License Detection and Compatibility Assessment
The quality and evolution of software are profoundly influenced by the security and reliability of the software supply chain.An essential element of this chain is the analysis of licenses associated with different software components.Open source li-censes play a vital role in defining conditions for using open source software,safeguarding intellectual property,and ensuring the sustained development of open source projects.To mitigate legal risks and protect against property losses,it is imperative to accu-rately identify open source software licenses and assess their compatibility.In this paper,we propose an innovative method for de-tecting open source licenses using ensemble learning,complemented by a recommendation system based on compatibility.Our main approach leverages ensemble learning techniques,particularly emphasizing the use of large language models.To bolster the accuracy of open source license detection,this methodology is augmented with rule matching.Subsequently,compatibility assess-ments and license recommendations are derived using directed graph algorithms.Experimental results validate the effectiveness of our method,showcasing not only reduced maintenance costs and heightened scalability but also superior detection performance in comparison to traditional methods.The proposed approach excels in identifying compatibility issues and provides dependable rec-ommendations,thereby contributing to a more secure and reliable software supply chain.

Large language modelEnsemble learningOpen source licenseSentence vector similarityCompatibility assessment

白江浩、朴勇

展开 >

大连理工大学软件学院 辽宁大连 116620

大语言模型 集成学习 开源许可证 句向量相似度 兼容性判断

2024

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2024.51(12)