Highlights From The 2018 Nyc Wimlds Scikit Sprint

4 minute read

Sprint Repo

The 2nd Annual NYC WiMLDS / Scikit Sprint was held on Saturday, September 29 at Stack Exchange in New York City. This is our repository for all items related to the 2018 NYC WiMLDS Scikit Sprint.


History of Scikit-Learn Python Library

This project was started in 2007 as a Google Summer of Code project by David Cournapeau. Later that year, Matthieu Brucher started work on this project as part of his thesis.

In 2010 Fabian Pedregosa, Gael Varoquaux, Alexandre Gramfort and Vincent Michel of INRIA took leadership of the project and made the first public release, February the 1st 2010. Since then, several releases have appeared following a ~3 month cycle, and a thriving international community has been leading the development.

This is how Andreas Mueller became involved in scikit-learn:

While working on my Ph.D. in computer vision and learning, the scikit-learn library became an essential part of my toolkit. I was an ardent user of the library, and I wanted to partake in its advancement. My initial participation in open source began in 2011 at the NIPS conference in Granada, Spain, where I had attended a scikit-learn sprint. The scikit-learn release manager at the time had to leave, and the project leads asked me to become release manager; that’s how it all got started.

Read more in this interview with Andreas Mueller.

Advertising the Sprint

A few weeks before the sprint, we still had 30 spots open. The below tweet reached so many scikit-learn users that it ensured that all spots were taken.

We even had two attendees who lived in Europe. They were visiting NYC during that time, and so joined the event on Saturday! I worked with one of them, Alice, who was visiting all the way from Paris. I was surprised and impressed that they chose to spend their Saturday in New York City at the sprint.

The Sprint

Book Signing

Andy gave away signed copies of his book Machine Learning with Python.

Testing Our Pull Request

After working on the issue and committing changes to our branch, we tested our pull request with this code:

pytest sklearn

It is always exciting to see (most of) our code tests passing.

Pull Requests Summary

It looks like 24 pull requests were submitted. Here’s a list of some of the pull requests.

Now I know why it’s called a sprint!

It was more challenging than I had anticipated. We spent most of the day going through the Python code base, trying to understand the maze of functions and how they are all related. We explored the Python files to determine where to place the assert error for the issue we picked.

We Made It!

What’s Next?

Looks like we still have some work to do on the pull request submitted by my sprint partner, Alice and me.

Participating in a Scikit Sprint

If you would like to participate in a scikit sprint, based on the history below, these look like the three best cities:

  1. Paris, France: where the scikit learn library was developed and where the majority of contributors reside
  2. New York, New York: where Andreas Mueller resides
  3. Austin, Texas: where the annual SciPy conference takes place

Listing of Scikit Sprints

There is a list on the scikit-learn wiki.

  • 2019
  • 2018
    • WiMLDS: New York City (Sep)
    • SciPy: Austin (open sprint, for new contributors) (Jul)
    • Paris: core sprint, for advanced contributors (Jul)
    • Two Sigma: New York City (Jun)
    • UC Berkeley: Berkeley(May 28 to Jun 2)
  • 2017
  • 2016
  • 2015
    • SciPy: Austin (Jul)
    • ODSC: San Francisco (Nov)
    • Criteo: Paris (Oct)
    • PyData: Paris (Apr)
  • 2014
    • Euro SciPy, Cambridge (Aug)
    • INRIA, Criteo, La paillasse, Tinyclues: Paris (Jul)
    • Cloudera, SF (Feb)
  • 2013
    • Paris (Jul)
    • SciPy: Austin (Jun)
  • 2012
    • EuroSciPy Bruxelles (Aug)
    • PyCon France (Jul)
    • SciPy: Austin (Jul)
  • 2011
  • 2010
    • Paris (Sep)
    • Paris (Jun)
    • Paris (Mar)
    • first release of scikit-learn (Feb)


Leave a Comment