Ensemble Learning Based Open Source License Detection and Compatibility Assessment
The quality and evolution of software are profoundly influenced by the security and reliability of the software supply chain.An essential element of this chain is the analysis of licenses associated with different software components.Open source li-censes play a vital role in defining conditions for using open source software,safeguarding intellectual property,and ensuring the sustained development of open source projects.To mitigate legal risks and protect against property losses,it is imperative to accu-rately identify open source software licenses and assess their compatibility.In this paper,we propose an innovative method for de-tecting open source licenses using ensemble learning,complemented by a recommendation system based on compatibility.Our main approach leverages ensemble learning techniques,particularly emphasizing the use of large language models.To bolster the accuracy of open source license detection,this methodology is augmented with rule matching.Subsequently,compatibility assess-ments and license recommendations are derived using directed graph algorithms.Experimental results validate the effectiveness of our method,showcasing not only reduced maintenance costs and heightened scalability but also superior detection performance in comparison to traditional methods.The proposed approach excels in identifying compatibility issues and provides dependable rec-ommendations,thereby contributing to a more secure and reliable software supply chain.
Large language modelEnsemble learningOpen source licenseSentence vector similarityCompatibility assessment