Impact Report For Wimlds Scikit Learn Sprints

5 minute read

Scikit-learn / WiMLDS Sprint Background

Scikit-learn, the machine learning library for Python, was first released in early 2010 by developers in Paris, France. To date, there is only 1 woman in the top 100 contributors to this library.

A 2013 study found that only 11% of open source contributors were women. A 2016 gender-inferred analysis examining the top 100 contributors for various programming languages found that just 2% of contributors to Python libraries on GitHub were women.

To address this gender imbalance for the scikit-learn library, Andreas Mueller, core contributor, initiated organizing an open source sprint in New York City with the local chapter of Women in Machine Learning and Data Science (WiMLDS). The first sprint was held in March 2017 and the second one was held in September of 2018. This report summarizes the impact of the two events.

The Sprint Events

Most attendees at these sprints were new to open source. Andy identified issues that were labeled “easy” or “good first issue” prior to the event so participants could review and become familiar with them. Prep work for the event included reviewing the Contributing documentation.

Impact Report for WiMLDS Scikit-learn Sprints

  Sprint 2017 Sprint 2018
Report date 11-Jan-2019 11-Jan-2019
Sprint date 04-Mar-2017 29-Sep-2018
Location New York, NY New York, NY
Open source library scikit-learn scikit-learn
Sprint repository link nyc-2017-scikit-sprint nyc-2018-scikit-sprint
     
Facilitator Andreas Mueller Andreas Mueller
Organizer Reshama Shaikh Reshama Shaikh
Teaching Assistants Vighnesh Nandan Birodkar / Ritesh Bansal Theodora Hinkle / Nicolas Hug
PULL REQUESTS (PRs)    
PRs [MRG] at sprint (a) 4 4
PRs [MRG] post-sprint (w/o follow-up) (b) 1 4
PRs [MRG] post-sprint (w/follow-up) (c) 0 8
TOTAL PRs MERGED (d) 5 16
     
PRs open (e) 4 4
PRs closed (by merged PRs) (f) 10 15
     
Attendees: Initial Registrations 60 63
Attendees: Participated ~ 30 35
Attendee List 2017 2018
Sponsoring venue Stack Exchange Stack Exchange
Event posting 2017 meetup event 2018 meetup event
Blog by Noemi Derzsy by Reshama Shaikh

Notes

  • (a) this represents the number of pull requests which were merged during the sprint day.
  • (b) this represents the number of pull requests that were merged after the sprint by participants. These participants submitted their PRs by their own initiative.
  • (c) these PRs were merged post-sprint, and which were followed up by the sprint organizer.
  • (d) this represents the total PRs that were merged in which provides one dimension of the impact of the sprint.
  • (e) this number represents the number of PRs from the sprint which are still open.
  • (f) this number represents PRs that were opened, but then closed by the successful merge of another PR.

Impact Summary for 2017

In 2017, 5 PRs were merged in:

  • 4 PRs were merged at the sprint
  • 1 PR was merged post-sprint without any follow-up
    • The PR merged post-sprint was by Sergul Aydore. After attending this sprint, Sergul then went on to participate in the August 2018 scikit-learn core sprint for advanced contributors in Paris. Sergul states:

      Participating in the March 2017 sprint helped me learn the basics and I was able to contribute to more complicated PRs in the August 2018 sprint.

  • No follow-up of open PRs was conducted.

Impact Summary for 2018

A total of 16 PRs were merged in as a result of the 2018 sprint:

  • 4 were merged at the sprint
  • 4 were updated and merged post-sprint by attendees who submitted of their own accord, without any follow-up.
  • To date, 8 PRs were merged by the sprint organizer (me) or members of the WiMLDS community. None of the initial sprint participants merged in a PR after follow-up.

Non-measureable Impact

Aside from the number of PRs that were merged, there is non-quantifiable impact of the open source sprint. Some examples include:

  • learning to set up virtual environment
  • using Git (fork, clone, branch, fetching another’s PR)
  • introduction to tests such as: flake8 (linting, formatting), pytest, “continuous integration”
  • navigating through the codebase structure of scikit-learn
  • digging into functions, learning about errors
  • learning about unit tests
  • interacting with contributors on GitHub
  • learning, in general
  • networking
  • building confidence (making a dent in “imposter syndrome”)
  • having fun

These sprints were held in New York city. The 2018 event was advertised on twitter. Two sprint participants from Europe were visiting NYC that week, had learned of the event via twitter, and joined: Sandra Mitrovic (Belgium) and Alice Martin (France).

Scikit-learn Issues and PRs

This is a list of issues that were resolved and PRs that were opened/closed/merged in:

Lessons Learned

  • Create tracking spreadsheet during the event: this was created after the event, and having it available during the sprint would have saved time in identifying which PRs were opened and who the contributors were.
  • Label Issues: issues that were worked on during the sprint could be tagged as “sprint” and included “#wimlds” to find and track them
  • Duration: sprint was 6 hours (10am to 4pm); extend duration of sprint to 6pm for more time
  • Follow-up: follow-up began one month post-sprint which was too long after the event. Sprint participants were contacted via event email, GitHub pinging, and individual email via meetup account and personal email, where available. The majority of sprint participants were non-responsive to follow-up communication, and none of those contacted completed their PR.
    • Contributing documentation: As a result of this abeyance, the documentation was updated to include verbage for addressing stalled PRs.
  • Goals: set a goal to complete all open PRs from the sprint within 30 to 60 days and issue a sprint impact report within 60 days.

Acknowledgments

Special thanks to the following Reviewers for their speedy review, assistance and patience:

Special thanks to these members of the NYC WiMLDS community for following up on “abandoned” or “stalled” PRs:


References


Addendum

Updated:

Leave a Comment