How to Talk About Machine Learning with Jupyter Notebooks

In the field of Machine Learning, scientists often use programming for data preprocessing, running the learning algorithms, and obtaining key metrics. To increase transparency, nowadays more and more additional material (such as datasets, code, documentation etc.) is shared so that fellow researchers can replicate these experiments. Jupyter Notebooks are a very valuable medium in this context – they are capable of displaying documentation, code, its output (such as visualizations, tables or logging messages) etc. side by side. Recently, Jupyter Notebooks have also been used in university courses more often. Here, the students benefit from the integration of code, its documentation, and the related exercise questions into a single interactive document. There are plenty of options how to design very appealing exercises for a course. Both in the scenario of transparent science and when using Jupyter Notebooks for teaching, the author’s code is meant to be run at another machine and achieve the same results. During this talk, possible issues during replication and suitable fixes are highlighted. The open source application JupyterHub can be part of that strategy. While the backend of the Integrated Development Environment runs on TUHH resources, the frontend is just a simple browser application. This reliefs the students from having up-to-date equipment for replication. Especially in times of COVID-19 this allows students to program from home more easily.