Below are detailed instructions, guidelines and recommendations for organizing a scikit-learn sprint.
This resource is available for anyone who would like to organize an open source sprint and also for any other open source library (including corporations, meetup groups, conferences, etc.).
Q: What is an open source sprint?
A: It is a full day (6 to 8 hours) hands on session or hackathon, where users convene and work on issues on an open source library. Sprints can span one or more days. Beginners typically work on issues related to documentation, fixing error messages, etc.
Below are two podcast links related to scikit-learn and open source.
Banana Data: Scikit-learn Sprints, featuring Reshama Shaikh
Now that we’ve covered how open source works, we’re looking to pull back the curtain and see who’s actually contributing. In part 2/2 of our series on open source, we sat down with Reshama Shaikh, a statistician and key organizer of scikit-learn sprints, to learn about the ups & downs of open source contributing, as well how a Sprint in Nairobi benefits Fortune 500 companies in the US.
Banana Data: Why Open Source, featuring Andreas Mueller
Open Source software such as scikit-Learn, Python, and Spark form the backbone of data science. In a two-part series, we’re covering the ins and outs of open source - and how this special type of software supports 98% of enterprise-level companies’ data science efforts.In part 1, we’re chatting with Andreas Mueller, a core contributor of scikit-Learn aboutthe value in open source versus corporate software, and what it looks like to run and govern this type of community-written (and driven) project.
These are the key logistics necessary for the sprint:
- a Saturday (or Sunday) is preferable (depends on region); in some cases a weekday works best.
- sponsor for food
- find core contributor to faciliate the sprint
Organizing the Event
Create a website with all information so all organizers and attendees can easily find what they need in a central place.
- Owner: sprint organizer
- Example: Bay Area WiMLDS 2019 sprint
- Note: this site is created using Google Sites. It is helpful to use tinyurl, such as: https://tinyurl.com/sf2019-sprint.
GitHub Repo for Sprint
This repository holds more detailed information for sprint day including documents, list of issues, etc.
- Owner: sprint organizer
- Example: bayarea-2019-scikit-sprint
It is helpful that sprint participants have some experience with Python and scikit-learn library in order to learn from this event.
It is up to the discretion of the sprint organizer regarding what information to collect based on regional and cultural norms. (Example: some regions may not collect gender.)
- Owner: sprint organizer
- Example: sprint application
Git is helpful but not required. Participants will work in pairs, so it is likely that someone else at event (pair partner or TA) can help with git.
Scikit-learn Core Contributor
The sprints have typically been led by a scikit-learn core contributor. These are the avenues for engaging a core contributor to lead a sprint:
- If there are core contributors residing in the city of the sprint, that is the easiest. The active list of core contributors is available on the scikit-learn website.
- If your chapter is located in a city where conferences are held, it is possible to plan a sprint around the date when a core contributor is in town. We did that for the Bay Area 2019 sprint. Andreas Mueller was in San Francisco for ODSC West, and thus the sprint was scheduled for the Saturday after the conference.
- It is possible to have a core contributor fly into your city, if a core contributor and funding to cover their travel is available. We did that for the Nairobi 2019 sprint. I asked on the [scikit-learn mailing list] and a contributor was interested.
Funding for Contributor
For funding, first step is to do research on estimated cost of flight and lodging for at least two days for budget considerations. There are a few options for funding:
- try to find a sponsor
- can connect with the scikit-learn team
Curating List of Beginner-friendly Issues for the Sprint
This list of issues is curated by the scikit-learn core contributor who will be leading the sprint.
- Owner: scikit-learn core contributor
- Example: curated issues
It’s a good idea to email the scikit-learn core contributor two weeks in advance of the sprint and remind them to curate the list. The list should be ready about one week in advance of the sprint and the link should be shared with participants so they can begin looking at the types of issues they will work on.
Issues selected are labled
good first issue: issues list
Event Space Sponsorship
It is best to tap into your local network and find a venue that can host.
For food, sometimes the venue that is hosting can sponsor it. If not, you can look for a separate sponsor for refreshments. It is also possible to organize the event without providing food and have attendees bring their own lunch and snacks.
Event Helpers: Scikit-learn Experts, Teaching Assistants & Helpers
In addition to the core contributor, it is important to have experienced users of scikit-learn and Git users who can help answer participants questions. Try to recruit as many scikit-learn expert users as you can to assist participants in answering questions.
Other helpers who are familiar with Git and setting up virtual environment are helpful. Developers who use scikit-learn and understand Python coding, unit testing and algorithms will be able to mentor as well.
In addition, it is helpful to have general volunteers who can help with signing in attendees and setting up food, taking photos and tweeting.
Pre-sprint Virtual Environment Set-up
Setting up the virtual environment can take some time, and some meetup groups may choose to have a “pre-event” meetup where participants can set up their working environment and be prepared for sprint day. This is also useful where wi-fi may be slow and cause delays in set-up.
Instructions for setting up virtual environment are available in two places:
Pre-sprint Git workshop
You may want to organize an evening Git workshop meetup event prior to the sprint. Here are some curriculum options:
Donate to NumFOCUS
The fee for the sprints have typically been free. The event has been organized by volunteers. Attendees should be encouraged to donate some amount (nominal, such as $5 to $10, or more depending on their budget) to support the open source community.
Donations can be made on the NumFOCUS website. For donations, the following information can be provided to attendees:
- Donation dedication:
- Person to inform: scikit-learn team
Marketing the event can be done through various avenues:
- Meetup group announcement / event
- Facebook groups
- Reach out to other communities in your area (PyLadies, etc.) to share the event
- Word of mouth: invite people in the community who use python and scikit-learn
The hashtag for the sprints is #ScikitLearnSprint
Social Media Cards
Canva.com is used to create social media cards.
Here is a template email of reminders to send to participants. It’s a good idea to send several reminders, leading up the event:
- 7 days prior
- 3 days prior
- 1 day prior
You can adjust the text depending on your location and cultural preferences.
The agenda typically goes as follows. It is flexible based on your preferences.
|8:00 to 8:30am||Organizers / Contributor / Helpers arrive|
|9 to 9:30 am||Attendees arrive (breakfast, set up audio/visual computer set up)|
|9:30 to 9:40am||Organizer introduces event|
|9:40 to 10:15am||Core contributor introduces Contributing Process|
|12:00 to 1pm||Lunch|
|1:00 to 3:30pm||Sprinting|
|3:30 to 3:45pm||Break|
|3:45 to 6pm||Sprinting|
Nametags are highly encouraged. They faciliate networking. This is the recommended items to include:
- First Name (required)
- Affiliation (highly recommended)
- Last Name (optional)
- Pronoun (optional)
Sprint Organizer Introduction
The sprint organizer typically does an introduction and can include the following announcements. There is a list of reminders posted in the GH repo.
- wifi instructions
- bathroom locations
- Code of Conduct (read CoC, let people know they can contact you or another organizer with any concerns)
- reminder to participants to take photos & tweet and share on social media (Twitter, LinkedIn, Instagram, etc.)
- cleaning up: We are in a borrowed space, please clean up after yourself
- thank venue host
- thank sponsors
- URGs space
- Speaking: let others speak: Two people speak once before you speak twice (From Write/Speak/Code)
- feedback form: complete feedback form
Pair programming is highly encouraged, but not required. Attendees can find a partner sitting near them. If some people are still looking for a partner, the sprint organizer can connect people at the event. Most attendees will work in pairs.
It is important to share the sprint on social media. It raises the visibility of your organization, scikit-learn, python and open source. It also helps in the planning of future events. Raising public awareness of the event aids in procuring event space and sponsorship.
Inclusivity & Cultural Considerations
Date of Sprint
Sprints should be scheduled around major national and religious holidays. A Saturday is ideal in many cities, but a weekend day may not be ideal in France. In Israel, the weekend is a Friday and Saturday and their days off are Sunday and Monday. In some locations a weekday event may be more ideal. Explore your cultural and business norms for an ideal day that works for the participants and contributor. No day will work for everyone, but pick a day that works for most of the organizers and participants and venue availability.
It is important to take photographs and share on social media to bring visibility to women in data science, scikit-learn and open source. It is also important to be respectful of people’s preferences and expectations for photography.
If you can find sponsorship for childcare that is wonderful. If not, there are some other options, up to the discretion of the sprint organizer:
- Permit partner of attendee to attend to assist in taking care of child
- Allow sprint participant to bring children to event
Check with the venue and see if the following can be arranged:
- Private room where a woman can pump
- Access to a refrigerator to store milk
It is helpful to arrange this accommodation prior to opening up applications for the sprint and include it on the sprint website. When done this way, an attendee does not have to make a special request and this accommodation gives more applicants the option to apply.
Having food that is accessible to people with various dietary considerations is inclusive. These are some dietary needs to consider:
- dairy free
- gluten free
Follow-up on PRs
While the sprint is a full day event, most PRs (pull requests) will not be merged at the event. PRs require review by core contributors and making changes is part of the typical back and forth. It’s important to communicate that to attendees. Some resources and time should be set aside by both sprint participants and sprint organizers to ensure that the work began during the sprint is completed.
Aim to complete the outstanding PRs within 60 days of the sprint.
The sprint organizer or an attendee can write a blog about their experience. It’s a good way to document the event, share with the community, and also for procuring future sponsorships. There can be multiple blogs, as different attendees have different perspectives. Blogs can be placed in any of the following platforms:
- Author’s personal blog
- Medium blog
- Publish on the sprint event website
- Publish on your organization website
Here are examples of blogs from previous events:
- Bay Area 2019
- Of Coding Sprints and Half Marathons (Valentina Borghesani)
- Highlights from the 2019 SF WiMLDS scikit-learn Open-Source Sprint (Katarina Slama)
- New York City
- 2019: Sprinting towards a more equitable future: The 2019 NYC WiMLDS Scikit-learn Sprint (Kelly Carmody)
- 2018: Highlights From The 2018 NYC WiMLDS Scikit Sprint (Reshama Shaikh)
- 2017: 2017 WiMLDS scikit-learn Sprint (Noemi Derzsy)
- Nairobi 2019:
- Highlights From The 2019 Nairobi WiMLDS Scikit Sprint (Mariam Haji)
- scikit-learn sprint at Nairobi, Kenya (Adrin Jalali)
Here are examples of collated Twitter Moments:
There is a Feedback Form available. Remind attendees at end of event to complete it. Also, include link to form in event follow-up email. Share results of feedback form with all involved in organizing the sprint, including the core contributor. Comments from sprint can be included in the Sprint Impact Report.
- Owner: sprint organizer
- Example: feedback survey
- Getting started: Rather than creating the feedback form from scratch, you can email me and I can copy a previous application, rename it to your city event, and you can edit it.
Below are some Sprint Impact Reports I have written. Feel free to copy the template and edit for your sprint:
- Nairobi WiMLDS 2019 Scikit-learn Impact Report (Aug 2019)
- Impact Report for 2017-2018 NYC WiMLDS Scikit-Learn Sprints (Jan 2019)
It is helpful to include:
- any metrics
- challenges or issues encountered
- what worked well
- what could be better
Aim to publish the impact report within 60 days of the sprint.
Below are resources. Sprint organizers are free to copy the templates and edit for their sprints.
|Event website (Google Sites)||https://tinyurl.com/sf2019-sprint|
|Application (Google Forms)||Bay Area sprint application|
|Marketing (social media cards)||Canva.com|
|Save the Date card|
|Marketing (tweets)||Save the date|
|Feedback Form (Google Forms)||Feedback Survey|
|Impact report||Nairobi WiMLDS 2019 Sprint Impact Report|