Association for Computing Machinery (ACM), ACM Transactions on Software Engineering and Methodology, 2022
DOI: 10.1145/3576036
Full text: Unavailable
Due to the exploratory nature of computational notebook development, a notebook can be extensively evolved even though it is small, potentially incurring substantial technical debt. Indeed, in interview studies notebook authors have attested to performing on-going tidying and big cleanups. However, many notebook authors are not trained as software developers, and environments like JupyterLab possess few features to aid notebook maintenance. As software refactoring is traditionally a critical tool for reducing technical debt, we sought to better understand the unique and growing ecology of computational notebooks by investigating the refactoring of public Jupyter notebooks. We randomly selected 15,000 Jupyter notebooks hosted on GitHub and studied 200 with meaningful commit histories. We found that notebook authors do refactor, favoring a few basic classic refactorings as well as those involving the notebook cell construct. Those with a computing background refactored differently than others, but not more so. Exploration-focused notebooks had a unique refactoring profile compared to more exposition-focused notebooks. Authors more often refactored their code as they went along, rather than deferring maintenance to big cleanups. These findings point to refactoring being intrinsic to notebook development.