首页期刊导航|Software quality journal
期刊信息/Journal information
Software quality journal
Springer
Software quality journal

Springer

季刊

0963-9314

Software quality journal/Journal Software quality journalSCIISTPEI
正式出版
收录年代

    Evaluating the effectiveness of neuron coverage metrics: a metamorphic-testing approach

    Zenghui ZhouPak-Lok PoonTsong Yueh ChenKun Qiu...
    19.1-19.25页
    查看更多>>摘要:Deep neural networks (DNNs) are now widely used in many sectors of our society. This phenomenon also means that if these DNNs contain faults, they will have profound adverse impacts on our daily lives. Thus, DNNs have to be comprehensively tested for "correctness" before they are released for use. Since such testing involves the use of a DNN test set, the comprehensiveness of this test set is of utmost importance. Until now, many researchers have proposed their own neuron-coverage (NC) metrics to measure the comprehensiveness of a DNN test set. However, their studies solely focused on those DNN testing scenarios with the presence of a test oracle. We observed that, in reality, there are many DNN testing scenarios where a test oracle does not exist and, therefore, the results of all previous studies may be inapplicable to these testing scenarios. Inspired by this observation, we have performed an empirical study to investigate the usefulness of some common and major NC metrics in terms of correlation analysis and invariability analysis. Our experiment results showed that, on the one hand, some NC metrics are useful measures of DNN test-set comprehensiveness (in terms of correlation analysis), but on the other hand, these metrics are not robust enough (in terms of invariability analysis).

    Test case optimization using grey wolf algorithm

    Srishti KumariShweta JindalArun Sharma
    20.1-20.30页
    查看更多>>摘要:Testing is an important part of any software development process. Increased effectiveness of software testing and reduction in cost can be achieved by reordering called Test Case Optimization (TCO). This paper proposes a new algorithm for multi-objective test case optimization (TCP) using the Grey Wolf Optimization (GWO) algorithm and current approach is based on optimizing maximum fault coverage with minimum runtime of test cases. The GWO is an optimization method inspired by nature, based on the mechanism of how grey wolf hunts. To achieve the research purpose, a comprehensive literature review of the GWO algorithm, software testing, and related optimization techniques was conducted. Based on that findings, this paper represents a new algorithm, named GWOJTCO, which combines GWO with the traveling salesman problem concept to optimize test cases in software regression testing. The proposed algorithm has been evaluated and analyzed over seventeen open-source problems along with one benchmark flex dataset. The experimentation is aimed at reduction in size and time of the resultant test suite. Also, 72% average percentage correctness have been achieved. Further, It was found that the average % fault detection of each program was nearly 85% or more going upto 100% fault coverage. Henceforth, GWO_TCO reduces the number of test cases required for software testing while maintaining a high level of early fault detection in a relatively short time, making it practical for real-world software testing scenarios. The studies' findings demonstrated that, in terms of the number of defects found and the effectiveness of the testing as a whole, the proposed algorithm performed better than almost all of the conventional approaches, providing valuable insights for software developers and testers to improve their testing processes and reduce testing time and costs.

    Predictive framework of software reliability analysis under multiple change points and imperfect debugging

    Nageswari NAnsuman MahapatraG. S. Mahapatra
    21.1-21.18页
    查看更多>>摘要:Software often goes through multiple testing phases, which can lead to discovering hidden faults. We propose a software hazard rate model with an imperfect debugging framework incorporating Multiple Change Points (MCP). This approach aims to improve Software Reliability Growth Models, providing a more accurate representation of the real-world testing environment during the software development process. The proposed model has fewer parameters, which offers better fitting across various datasets while reducing complexity compared to existing models. Four distinct real-world datasets are used to assess goodness of fit, demonstrating its efficacy and producing a more efficient and general model. This study integrated an MCP into the Jelinski-Moranda model to develop a hazard rate approach. Our model predicts software reliability with greater precision compared to existing models. Akaike's Information Criterion, Root Mean Squared Error, Hazard Rate Approach, Absolute Error, Average Error, Multiple Determination Coefficient, and Relative Predictive Error depict favourable outcomes for the MCP model.

    Supporting the identification of prevalent quality issues in code changes by analyzing reviewers' feedback

    Umar IftikharJuergen BoerstlerNauman Bin AliOliver Kopp...
    22.1-22.34页
    查看更多>>摘要:Context: Code reviewers provide valuable feedback during the code review. Identifying common issues described in the reviewers' feedback can provide input for devising context-specific software development improvements. However, the use of reviewer feedback for this purpose is currently less explored. Objective: In this study, we assess how automation can derive more interpretable and informative themes in reviewers' feedback and whether these themes help to identify recurring quality-related issues in code changes. Method: We conducted a participatory case study using the JabRef system to analyze reviewers' feedback on merged and abandoned code changes. We used two promising topic modeling methods (GSDMM and BERTopic) to identify themes in 5,560 code review comments. The resulting themes were analyzed and named by a domain expert from JabRef. Results: The domain expert considered the identified themes from the two topic models to represent quality-related issues. Different quality issues are pointed out in code reviews for merged and abandoned code changes. While BERTopic provides higher objective coherence, the domain expert considered themes from short-text topic modeling more informative and easy to interpret than BERTopic-based topic modeling. Conclusions: The identified prevalent code quality issues aim to address the maintainability-focused issues. The analysis of code review comments can enhance the current practices for JabRef by improving the guidelines for new developers and focusing discussions in the developer forums. The topic model choice impacts the inter-pretability of the generated themes, and a higher coherence (based on objective measures) of generated topics did not lead to improved interpretability by a domain expert.

    Teamwork quality analysis - the development journey from a questionnaire to a playable game to address different preferences of teams to optimize engagement and effectivity

    Alexander PothMario Kottke
    23.1-23.23页
    查看更多>>摘要:The evolution of quality management has progressed from a focus on product quality, initially characterized by craftsmanship, to a focus on process quality with industrialization. In recent years, the emphasis has shifted towards agile development, making the team the central focus, and, consequently, team quality has become a focus. Approximately five years ago, Volkswagen Group IT's quality management began to emphasize agile teamwork quality. This focus is crucial for handling the"soft aspects"of product development and service delivery. This research work details the development and evolution of the Teamwork Quality Analysis (TQA) approach into a game, which can be adapted to various scenarios and settings. Some variants of the game align well with agile rituals, such as retrospectives. The mythological approach used in game development is also presented. The evaluation of the game includes findings from its implementation in large-scale programs and adoption in established line-organizations. The initial game development targeted Scrum and SAFe® teams within the Volkswagen Group IT. However, the TQA game was later generalized during a development iteration to be applicable for teams beyond software development.

    Comparative analysis of text mining and clustering techniques for assessing functional dependency between manual test cases

    Sahar TahviliLeo HatvaniMichael FeldererFrancisco Gomes de Oliveira Neto...
    24.1-24.36页
    查看更多>>摘要:Text mining techniques, particularly those leveraging machine learning for natural language processing, have gained significant attention for qualitative data analysis in software testing. However, their complexity and lack of transparency can pose challenges, especially in safety-critical domains where simpler, interpretable solutions are often preferred unless accuracy is heavily compromised. This study investigates the trade-offs between complexity, effort, accuracy, and utility in text mining and clustering techniques, focusing on their application for detecting functional dependencies among manual integration test cases in safety-critical systems. Using empirical data from an industrial testing project at ALSTOM Sweden, we evaluate various string distance methods, NCD compressors, and machine learning approaches. The results highlight the impact of preprocessing techniques, such as tokenization, and intrinsic factors, such as text length, on algorithm performance. Findings demonstrate how text mining and clustering can be optimized for safety-critical contexts, offering actionable insights for researchers and practitioners aiming to balance simplicity and effectiveness in their testing workflows.

    QNet: exploring deep learning for quantum code smell detection

    Ruchika MalhotraBhawna JainMarouane Kessentini
    25.1-25.28页
    查看更多>>摘要:Quantum computing (QC) has surged as a burgeoning domain, driving the evolution of novel programming paradigms. Despite extensive exploration, a critical aspect remains underex-plored: detecting code smells in quantum programs (QPs). Code smells, indicative of potential maintenance challenges, have been extensively studied in classical programming but pose unique hurdles in the quantum realm due to inherent disparities caused by unstable states in QC. This paper proposes a novel approach leveraging deep learning (DL) techniques for detecting quantum code smells (QCS). A comprehensive ablation study compares DL methodologies, including Convolutional Neural Networks (CNN), Gated Recurrent Units (GRU), and Long Short-Term Memory (LSTM), and proposes a hybrid model combining CNN, LSTM, and GRU layers. Critical research questions are posed regarding QCS influence on error rates in quantum computers, the relative impact of different QCS on quantum system performance, and the complementary behaviour of distinct DL models in QCS detection. Through manual curation of a labelled dataset comprising 136 open-source projects, quantum circuits are extracted and analyzed. When evaluated on the quantum dataset, this hybrid model outperforms single-layer models. Furthermore, a comparative analysis with a transfer learning (TL) approach employing a pre-trained Bidirectional Encoder Representation Transformers (BERT) model underscores the superiority of the proposed DL-based solution. The proposed model achieves an impressive accuracy rate of 92.86%, surpassing existing DL and TL approaches. In conclusion, this research demonstrates the potential of DL to identify QCS, with the hybrid model offering avenues for further discoveries in QCS detection using DL.