Notebooks come alive when interactive widgets are used. Users can visualize and control changes in the data and the model. Learning becomes an immersive, plus fun, experience.
You have coded in Jupyter, the ubiquitous notebook platform for coding and testing cool ideas in virtually all major programming languages. You love it, you use it regularly.
But you want more control, you want to change variables at the simple swipe of your mouse, not by writing a for-loop. What should you do? You can use IPython widget. Read on…
What is Python Widget?
Project Jupyter was born out of the IPython Project in 2014 and evolved rapidly to support interactive data science and scientific computing across all major programming languages. There is no doubt that it has left one of the biggest degrees of impact on how a data scientist can quickly test and prototype his/her idea and showcase the work to peers and open-source community.
However, learning and experimenting with data become truly immersive when user can interactively control the parameters of the model and see the effect (almost) real-time. Most of the common rendering in Jupyter are static. However, there is a big effort to introduce elements called ipywidgets, which renders fun and interactive controls on the Jupyter notebook.
In a previous article, I demonstrated a simple curve fitting exercise using basic widget controls. Please read that article for instructions related to the installation of this widget package. In this article, that is extended further in the realm of interactive machine learning techniques.
Interactive Linear Regression
We demonstrate simple linear regression of single variable using interactive control elements. Note, the idea can be extended for complex multi-variate, nonlinear, kernel based regression easily. However, just for simplicity of visualization, we stick to single variable case in this demo.
The boiler plate code is available in my Github repository. We show the interactivity in two stages. First, we show the data generation process as a function of input variables and statistical properties of the associated noise. Here is a video of the process where user can dynamically generate and plot the nonlinear function using simple slide-bar controls.
Here, the generating function (aka ‘ ground truth’) is a 4th degree polynomial and the noise comes from a Gaussian distribution. Next, we write a linear regression function using scikit-learn’s polynomial features generation and pipeline methods. A detailed step-by-step guide of such a machine learning pipeline process is given here. Here, we wrap the whole function inside another interactive control widget to be able to dynamically alter the various parameters of the linear model.
We introduce interactive control for the following hyperparameters.
- Model complexity (degree of polynomial)
- Regularization type — LASSO or Ridge
- Size of the test set (fraction of total sample data used in test)
Following video shows the user interaction with the linear regression model. Note, how the test and training scores are also updated dynamically to show a trend of over-fitting or under-fitting as the model complexity changes. One can go back to the data generation control and increase of decrease the noise magnitude to see its impact on the fitting quality and bias/variance trade-off.
We presented a brief overview of a Jupyter notebook with embedded interactive control objects which allow the user/programmer to dynamically play with the generation and modeling of a data set. Current demo allows the user to introduce noise, change model complexity, and examine the impact of regularization, all on the fly and see the resulting model and predictions instantly. But the whole idea is explained in a step-by-step manner in the notebook, which should help interested reader to experiment with these widgets and to come up with lively, interactive machine learning or statistical modeling projects.