Empirical Analysis of Pull Requests for Google Summer of Code
Abstract
Internship and industry-affiliated capstone projects are popular ways to expose students to real world experience and bridge the gap between academic training and industry requirements. However, these two approaches often require active industry collaboration and many students often struggle to find industry placements. Open-source contributions is a crucial alternative to gain real world experience, earn publicly verifiable contribution with real world impact, and learn from experienced open-source contributors. The Google Summer of Code is a global initiative that matches students or new contributors with experienced mentors to work on open-source projects. The goal of the program is to introduce the students to open-source, help gain valuable skills under the guidance of a mentor, and hopefully continue to contribute to open source development; thereby, provide a continuous pool of talented new contributors necessary for maintaining an open source project. This study presents an empirical analysis of pull requests created by interns during the Google Summer of Code program. We extracted and analysed 17,232 pull requests from 2456 interns across 1937 open-source projects. The results show that majority of the tasks involve both code-intensive tasks like adding new features and fixing bugs as well as non-code tasks like updating the documentation and restructuring the code base. The feedback from reviewers covers code functionality and programming logic, testing coverage, error handling, code readability, and adopting best practices. Finally, we discuss the implications of these results for software engineering education.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2024
- DOI:
- arXiv:
- arXiv:2412.13120
- Bibcode:
- 2024arXiv241213120P
- Keywords:
-
- Computer Science - Software Engineering