How To Organize A Scikit Learn Sprint

11 minute read

Below are detailed instructions, guidelines and recommendations for organizing a scikit-learn sprint.

This resource is available for anyone who would like to organize an open source sprint and also for any other open source library (including corporations, meetup groups, conferences, etc.).

Intro

Q: What is an open source sprint?
A: It is a full day (6 to 8 hours) hands on session or hackathon, where users convene and work on issues on an open source library. Sprints can span one or more days. Beginners typically work on issues related to documentation, fixing error messages, etc.

Background

Below are two podcast links related to scikit-learn and open source.

Banana Data: Scikit-learn Sprints, featuring Reshama Shaikh

Now that we’ve covered how open source works, we’re looking to pull back the curtain and see who’s actually contributing. In part 2/2 of our series on open source, we sat down with Reshama Shaikh, a statistician and key organizer of scikit-learn sprints, to learn about the ups & downs of open source contributing, as well how a Sprint in Nairobi benefits Fortune 500 companies in the US.

Banana Data: Why Open Source, featuring Andreas Mueller

Open Source software such as scikit-Learn, Python, and Spark form the backbone of data science. In a two-part series, we’re covering the ins and outs of open source - and how this special type of software supports 98% of enterprise-level companies’ data science efforts.In part 1, we’re chatting with Andreas Mueller, a core contributor of scikit-Learn aboutthe value in open source versus corporate software, and what it looks like to run and govern this type of community-written (and driven) project.

Key Logistics

These are the key logistics necessary for the sprint:

  • date
  • venue
  • a Saturday (or Sunday) is preferable (depends on region); in some cases a weekday works best.
  • sponsor for food
  • find core contributor to faciliate the sprint

Organizing the Event

Sprint Website

Create a website with all information so all organizers and attendees can easily find what they need in a central place.

GitHub Repo for Sprint

This repository holds more detailed information for sprint day including documents, list of issues, etc.

Sprint Application

It is helpful that sprint participants have some experience with Python and scikit-learn library in order to learn from this event.

It is up to the discretion of the sprint organizer regarding what information to collect based on regional and cultural norms. (Example: some regions may not collect gender.)

Git

Git is helpful but not required. Participants will work in pairs, so it is likely that someone else at event (pair partner or TA) can help with git.

Scikit-learn Core Contributor

The sprints have typically been led by a scikit-learn core contributor. These are the avenues for engaging a core contributor to lead a sprint:

  1. If there are core contributors residing in the city of the sprint, that is the easiest. The active list of core contributors is available on the scikit-learn website.
  2. If your chapter is located in a city where conferences are held, it is possible to plan a sprint around the date when a core contributor is in town. We did that for the Bay Area 2019 sprint. Andreas Mueller was in San Francisco for ODSC West, and thus the sprint was scheduled for the Saturday after the conference.
  3. It is possible to have a core contributor fly into your city, if a core contributor and funding to cover their travel is available. We did that for the Nairobi 2019 sprint. I asked on the [scikit-learn mailing list] and a contributor was interested.

Funding for Contributor

For funding, first step is to do research on estimated cost of flight and lodging for at least two days for budget considerations. There are a few options for funding:

  1. try to find a sponsor
  2. can connect with the scikit-learn team

Curating List of Beginner-friendly Issues for the Sprint

This list of issues is curated by the scikit-learn core contributor who will be leading the sprint.

It’s a good idea to email the scikit-learn core contributor two weeks in advance of the sprint and remind them to curate the list. The list should be ready about one week in advance of the sprint and the link should be shared with participants so they can begin looking at the types of issues they will work on.

Issues selected are labled easy or good first issue: issues list

Event Space Sponsorship

It is best to tap into your local network and find a venue that can host.

Food Sponsorship

For food, sometimes the venue that is hosting can sponsor it. If not, you can look for a separate sponsor for refreshments. It is also possible to organize the event without providing food and have attendees bring their own lunch and snacks.

Event Helpers: Scikit-learn Experts, Teaching Assistants & Helpers

In addition to the core contributor, it is important to have experienced users of scikit-learn and Git users who can help answer participants questions. Try to recruit as many scikit-learn expert users as you can to assist participants in answering questions.

Other helpers who are familiar with Git and setting up virtual environment are helpful. Developers who use scikit-learn and understand Python coding, unit testing and algorithms will be able to mentor as well.

In addition, it is helpful to have general volunteers who can help with signing in attendees and setting up food, taking photos and tweeting.

Pre-sprint Virtual Environment Set-up

Optional

Setting up the virtual environment can take some time, and some meetup groups may choose to have a “pre-event” meetup where participants can set up their working environment and be prepared for sprint day. This is also useful where wi-fi may be slow and cause delays in set-up.

Instructions for setting up virtual environment are available in two places:

Pre-sprint Git workshop

Optional

You may want to organize an evening Git workshop meetup event prior to the sprint. Here are some curriculum options:

Optional; Strongly encouraged

The fee for the sprints have typically been free. The event has been organized by volunteers. Attendees should be encouraged to donate some amount (nominal, such as $5 to $10, or more depending on their budget) to support the open source community.

Donations can be made on the NumFOCUS website. For donations, the following information can be provided to attendees:

  • Donation dedication: scikit-learn
  • Person to inform: scikit-learn team

Marketing

Marketing the event can be done through various avenues:

  1. Meetup group announcement / event
  2. Twitter
  3. LinkedIn
  4. Slack
  5. Facebook groups
  6. Reach out to other communities in your area (PyLadies, etc.) to share the event
  7. Word of mouth: invite people in the community who use python and scikit-learn

Hashtags

The hashtag for the sprints is #ScikitLearnSprint

Social Media Cards

Canva.com is used to create social media cards.

Preparation Emails

Here is a template email of reminders to send to participants. It’s a good idea to send several reminders, leading up the event:

  • 7 days prior
  • 3 days prior
  • 1 day prior

You can adjust the text depending on your location and cultural preferences.


Sprint Day

Agenda

The agenda typically goes as follows. It is flexible based on your preferences.

Time Description
8:00 to 8:30am Organizers / Contributor / Helpers arrive
9 to 9:30 am Attendees arrive (breakfast, set up audio/visual computer set up)
9:30 to 9:40am Organizer introduces event
9:40 to 10:15am Core contributor introduces Contributing Process
10:15am Sprint
12:00 to 1pm Lunch
1:00 to 3:30pm Sprinting
3:30 to 3:45pm Break
3:45 to 6pm Sprinting

Nametags

Nametags are highly encouraged. They faciliate networking. This is the recommended items to include:

  • First Name (required)
  • Affiliation (highly recommended)
  • Last Name (optional)
  • Pronoun (optional)

Sprint Organizer Introduction

The sprint organizer typically does an introduction and can include the following announcements. There is a list of reminders posted in the GH repo.

  • wifi instructions
  • bathroom locations
  • Code of Conduct (read CoC, let people know they can contact you or another organizer with any concerns)
  • reminder to participants to take photos & tweet and share on social media (Twitter, LinkedIn, Instagram, etc.)
  • cleaning up: We are in a borrowed space, please clean up after yourself
  • thank venue host
  • thank sponsors
  • URGs space
    • Speaking: let others speak: Two people speak once before you speak twice (From Write/Speak/Code)
  • feedback form: complete feedback form

Pair Programming

Pair programming is highly encouraged, but not required. Attendees can find a partner sitting near them. If some people are still looking for a partner, the sprint organizer can connect people at the event. Most attendees will work in pairs.

Social Media

It is important to share the sprint on social media. It raises the visibility of your organization, scikit-learn, python and open source. It also helps in the planning of future events. Raising public awareness of the event aids in procuring event space and sponsorship.


Inclusivity & Cultural Considerations

Date of Sprint

Sprints should be scheduled around major national and religious holidays. A Saturday is ideal in many cities, but a weekend day may not be ideal in France. In Israel, the weekend is a Friday and Saturday and their days off are Sunday and Monday. In some locations a weekday event may be more ideal. Explore your cultural and business norms for an ideal day that works for the participants and contributor. No day will work for everyone, but pick a day that works for most of the organizers and participants and venue availability.

Photography

It is important to take photographs and share on social media to bring visibility to women in data science, scikit-learn and open source. It is also important to be respectful of people’s preferences and expectations for photography.

Childcare

Optional

If you can find sponsorship for childcare that is wonderful. If not, there are some other options, up to the discretion of the sprint organizer:

  1. Permit partner of attendee to attend to assist in taking care of child
  2. Allow sprint participant to bring children to event

Mother’s Room

Check with the venue and see if the following can be arranged:

  1. Private room where a woman can pump
  2. Access to a refrigerator to store milk

It is helpful to arrange this accommodation prior to opening up applications for the sprint and include it on the sprint website. When done this way, an attendee does not have to make a special request and this accommodation gives more applicants the option to apply.

Food

Having food that is accessible to people with various dietary considerations is inclusive. These are some dietary needs to consider:

  1. vegetarian
  2. dairy free
  3. gluten free
  4. etc.

Post-sprint

Follow-up on PRs

While the sprint is a full day event, most PRs (pull requests) will not be merged at the event. PRs require review by core contributors and making changes is part of the typical back and forth. It’s important to communicate that to attendees. Some resources and time should be set aside by both sprint participants and sprint organizers to ensure that the work began during the sprint is completed.

Aim to complete the outstanding PRs within 60 days of the sprint.

Blogs

The sprint organizer or an attendee can write a blog about their experience. It’s a good way to document the event, share with the community, and also for procuring future sponsorships. There can be multiple blogs, as different attendees have different perspectives. Blogs can be placed in any of the following platforms:

  1. Author’s personal blog
  2. Medium blog
  3. Publish on the sprint event website
  4. Publish on your organization website

Here are examples of blogs from previous events:

Twitter Moments

Here are examples of collated Twitter Moments:

Feedback Form

There is a Feedback Form available. Remind attendees at end of event to complete it. Also, include link to form in event follow-up email. Share results of feedback form with all involved in organizing the sprint, including the core contributor. Comments from sprint can be included in the Sprint Impact Report.

  • Owner: sprint organizer
  • Example: feedback survey
  • Getting started: Rather than creating the feedback form from scratch, you can email me and I can copy a previous application, rename it to your city event, and you can edit it.

Impact Report

Below are some Sprint Impact Reports I have written. Feel free to copy the template and edit for your sprint:

It is helpful to include:

  • any metrics
  • challenges or issues encountered
  • what worked well
  • what could be better

Aim to publish the impact report within 60 days of the sprint.


Resources

Below are resources. Sprint organizers are free to copy the templates and edit for their sprints.

Resource Reference
Repository (GitHub) https://github.com/WiMLDS/bayarea-2019-scikit-sprint/
Event website (Google Sites) https://tinyurl.com/sf2019-sprint
Application (Google Forms) Bay Area sprint application
Marketing (social media cards) Canva.com
  Save the Date card
  Sponsors
  Event collage
Marketing (tweets) Save the date
  Applications open
Feedback Form (Google Forms) Feedback Survey
Impact report Nairobi WiMLDS 2019 Sprint Impact Report

Updated:

Leave a Comment