Predicting the Change Impact of Resolving Defects by Leveraging the Topics of Issue Reports in Open Source Software Systems

Assi, Maram; Hassan, Safwat; Georgiou, Stefanos; Zou, Ying

Published in

Association for Computing Machinery (ACM), ACM Transactions on Software Engineering and Methodology, 6(32), p. 1-34, 2023

DOI: 10.1145/3593802

Tools

Export citation

Search in Google Scholar

Predicting the Change Impact of Resolving Defects by Leveraging the Topics of Issue Reports in Open Source Software Systems

Journal article published in 2023 by Maram Assi

, Safwat Hassan

, Stefanos Georgiou

, Ying Zou

This paper was not found in any repository, but could be made available legally by the author.

Full text: Unavailable

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Upon receiving a new issue report, practitioners start by investigating the defect type, the potential fixing effort needed to resolve the defect and the change impact. Moreover, issue reports contain valuable information, such as, the title, description and severity, and researchers leverage the topics of issue reports as a collective metric portraying similar characteristics of a defect. Nonetheless, none of the existing studies leverage the defect topic, i.e., a semantic cluster of defects of the same nature, such as Performance, GUI, and Database , to estimate the change impact that represents the amount of change needed in terms of code churn and the number of files changed. To this end, in this article, we conduct an empirical study on 298,548 issue reports belonging to three large-scale open-source systems, i.e., Mozilla, Apache, and Eclipse, to estimate the change impact in terms of code churn or the number of files changed while leveraging the topics of issue reports. First, we adopt the Embedded Topic Model (ETM), a state-of-the-art topic modelling algorithm, to identify the topics. Second, we investigate the feasibility of predicting the change impact using the identified topics and other information extracted from the issue reports by building eight prediction models that classify issue reports requiring small or large change impact along two dimensions, i.e., the code churn size and the number of files changed. Our results suggest that XGBoost is the best-performing algorithm for predicting the change impact, with an AUC of 0.84, 0.76, and 0.73 for the code churn and 0.82, 0.71, and 0.73 for the number of files changed metric for Mozilla, Apache, and Eclipse, respectively. Our results also demonstrate that the topics of issue reports improve the recall of the prediction model by up to 45%.

Published in

Links

Tools

Predicting the Change Impact of Resolving Defects by Leveraging the Topics of Issue Reports in Open Source Software Systems

Abstract