Understanding Feedback Mechanisms in Machine Learning Jupyter Notebooks
Abstract
The machine learning development lifecycle is characterized by iterative and exploratory processes that rely on feedback mechanisms to ensure data and model integrity. Despite the critical role of feedback in machine learning engineering, no prior research has been conducted to identify and understand these mechanisms. To address this knowledge gap, we mine 297.8 thousand Jupyter notebooks and analyse 2.3 million code cells. We identify three key feedback mechanisms -- assertions, print statements and last cell statements -- and further categorize them into implicit and explicit forms of feedback. Our findings reveal extensive use of implicit feedback for critical design decisions and the relatively limited adoption of explicit feedback mechanisms. By conducting detailed case studies with selected feedback instances, we uncover the potential for automated validation of critical assumptions in ML workflows using assertions. Finally, this study underscores the need for improved documentation, and provides practical recommendations on how existing feedback mechanisms in the ML development workflow can be effectively used to mitigate technical debt and enhance reproducibility.
- Publication:
-
arXiv e-prints
- Pub Date:
- July 2024
- DOI:
- arXiv:
- arXiv:2408.00153
- Bibcode:
- 2024arXiv240800153S
- Keywords:
-
- Computer Science - Software Engineering