基于Tri-Training半监督学习的非功能性需求分类方法在工业软件中的应用

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：结合Word2Vec的Skip-gram模型在提取复杂软件需求文档中细微语义差异方面的优势,提出了一种基于Tri-Training半监督学习的非功能性需求分类方法,旨在应对软件需求工程领域中标记样本数量有限的挑战,从而解决非功能性需求分类性能下降的问题.与传统应用于完全冗余视图或单一分类器的半监督学习算法不同,半监督学习Tri-training算法通过用自举抽样产生的3个不同的标记数据集初始化3个不同的分类器,利用三个分类器以多数投票规则来产生伪标记数据,从而解除对训练集的限制,提高分类框架的通用性和可用性.将本文方法应用于涵盖多个工业领域的PROMISE软件需求数据集中,结果表明,基于Tri-Training半监督学习的非功能性需求分类方法在不同标记比例的数据集上具有良好的分类性能,特别是在标记数据不足的情况下,相比于监督学习和其他半监督学习算法,该方法在召回率和F1值上具有显著优势.

外文标题：Application of a Tri-Training Semi-Supervised Learning Method for Non-Functional Requirement Classification in Industrial Software

外文摘要：We combine the advantages of the Word2Vec Skip-gram model in extracting subtle semantic differences from complex software requirement documents and propose a non-functional requirements method based on Tri-Training semi-supervised learning.This approach addresses the challenge of limited labeled samples in software requirements engineering,thus mitigating the performance degradation in non-functional requirement classification.Unlike traditional semi-supervised learning algorithms applied to entirely redundant views or a single classifier,the semi-supervised Tri-Training algorithm initializes three distinct classifiers with three different labeled datasets generated through bootstrapping.It employs the majority voting rule among these classifiers to produce pseudo-labeled data,thereby mitigating constraints on the training set and augmenting the generality and applicability of the classification framework.The method described in this paper is applied to the PROMISE software requirements dataset covering multiple industrial domains.The results demonstrate that the non-functional requirement classification method based on Tri-Training semi-supervised learning exhibits commendable classification performance across datasets with various labeled proportions,particularly under conditions of insufficient labeled data.Compared to supervised learning and other semi-supervised learning algorithms,this method shows significant recall and F1 score advantages.

外文关键词：

software requirement classificationsemi-supervised learningTri-Training

作者：

宋百灵、何彦众、张泽贤、曾诚、俞嘉怡、刘进、胡文华

展开 >

作者单位：

武汉理工大学计算机与人工智能学院,湖北武汉 430070

湖北大学人工智能学院,湖北武汉 430062

武汉大学计算机学院,湖北武汉 430072

关键词：

软件需求分类半监督学习 Tri-Training

基金：

国家自然科学基金湖北省重点研发计划项目

项目编号：

622023502021BAA188

出版年：

2024

DOI：

10.14188/j.1671-8836.2023.0227

武汉大学学报(理学版)

武汉大学

武汉大学学报(理学版)

CSTPCD北大核心

影响因子：0.814

ISSN：1671-8836

年,卷(期)：2024.70(3)