Dissemin is shutting down on January 1st, 2025

Published in

Association for Computing Machinery (ACM), ACM Transactions on Software Engineering and Methodology, 6(32), p. 1-34, 2023

DOI: 10.1145/3593802

Links

Tools

Export citation

Search in Google Scholar

Predicting the Change Impact of Resolving Defects by Leveraging the Topics of Issue Reports in Open Source Software Systems

Journal article published in 2023 by Maram Assi ORCID, Safwat Hassan ORCID, Stefanos Georgiou ORCID, Ying Zou ORCID
This paper was not found in any repository, but could be made available legally by the author.
This paper was not found in any repository, but could be made available legally by the author.

Full text: Unavailable

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

Upon receiving a new issue report, practitioners start by investigating the defect type, the potential fixing effort needed to resolve the defect and the change impact. Moreover, issue reports contain valuable information, such as, the title, description and severity, and researchers leverage the topics of issue reports as a collective metric portraying similar characteristics of a defect. Nonetheless, none of the existing studies leverage the defect topic, i.e., a semantic cluster of defects of the same nature, such as Performance, GUI, and Database , to estimate the change impact that represents the amount of change needed in terms of code churn and the number of files changed. To this end, in this article, we conduct an empirical study on 298,548 issue reports belonging to three large-scale open-source systems, i.e., Mozilla, Apache, and Eclipse, to estimate the change impact in terms of code churn or the number of files changed while leveraging the topics of issue reports. First, we adopt the Embedded Topic Model (ETM), a state-of-the-art topic modelling algorithm, to identify the topics. Second, we investigate the feasibility of predicting the change impact using the identified topics and other information extracted from the issue reports by building eight prediction models that classify issue reports requiring small or large change impact along two dimensions, i.e., the code churn size and the number of files changed. Our results suggest that XGBoost is the best-performing algorithm for predicting the change impact, with an AUC of 0.84, 0.76, and 0.73 for the code churn and 0.82, 0.71, and 0.73 for the number of files changed metric for Mozilla, Apache, and Eclipse, respectively. Our results also demonstrate that the topics of issue reports improve the recall of the prediction model by up to 45%.