As we already explained, the idea of a Jupyter Notebook is that it can contain code, the output produced by the code, and rich text that, like in a normal text document, can be styled and include images, tables, equations, etc. Jupyter Notebook is a client-server application meaning that the core Jupyter program can be installed and run locally on your own computer or on a remote server. In both cases, you communicate with it via your web browser to create, edit, and execute your notebooks.
The history of Jupyter Notebook goes back to the year 2001 when Fernando Pérez started the development of IPython, a command shell for Python (and other languages) that provides interactive computing functionalites. In 2007, the IPython team started the development of a notebook system based on IPython for combining text, calculations, and visualizations, and a first version was released in 2011. In 2014, the notebook part was split off from IPython and became Project Jupyter, with IPython being the most common kernel (= program component for running the code in a notebook) for Jupyter but not the only one. There now exist kernels to provide programming language support for Jupyter notebooks for many common languages including Ruby, Perl, Java, C/C++, R, and Matlab.
To get a first impression of Jupyter Notebook have a look at Figure 3.2 (which you already saw earlier). The shown excerpt consists of two code cells with Python code (those with starting with “In [...]:“) and the output produced by running the code (“Out[...]:”), and of three different rich text cells before, after, and between the code cells with explanations of what is happening. The currently active cell is marked by the blue bar on the left and frame around it.
Before we continue to discuss what Jupyter Notebook has to offer, let’s get it running on your computer so that you can directly try out the examples.
Juypter Notebook is already installed in the Anaconda environment AC36 or AC37 we created in Section 3.2. If you have the Anaconda Navigator running, make sure it shows the “Home” page and that our AC36 or AC37 environment is selected. Then you should see an entry for Juypter Notebook with a Launch button (Figure 3.3). Click the ‘Launch’ button to start up the application. This will ensure that your notebook starts with the correct environment. Starting Jupyter Notebook this way will create the link to shortcuts for a Jupyter Notebook to that conda environment so you can use it in the way we describe below.
Alternatively, you can start up Jupyter directly without having to open Anaconda first: You will find the Juypter Notebook application in your Windows application list as a subentry of Anaconda. Be sure that you start the Jupyter Notebook for the recently created conda environment (which will only be created if you change the dropdown in Anaconda Navigator above). Alternatively, simply press the Windows key and then type in the first few characters of Jupyter until Jupyter (with the correct conda environment) shows up in the search results.
When you start up Jupyter, two things will happen: The server component of the Jupyter application will start up in a Windows command line window showing log messages, e.g. that the server is running locally under the address http://localhost:8888/ (see Figure 3.4 (a)). When you start Jupyter from the Anaconda Navigator, this will actually happen in the background and you won't get to see the command line window with the server messages. In addition, the web-based client application part of Jupyter will open up in your standard web browser showing you the so-called Dashboard, the interface for managing your notebooks, creating new ones, and also managing the kernels. Right now it will show you the content of the default Jupyter home folder, which is your user’s home folder, in a file browser-like interface (Figure 3.4 (b)).
The file tree view allows you to navigate to existing notebooks on your disk, to open them, and to create new ones. Notebook files will have the file extension .ipynb . Let’s start by creating a new notebook file to try out the things shown in the next sections. Click the ‘New…’ button at the top right, then choose the ‘Python 3’ option. A new tab will open up in your browser showing an empty notebook page as in Figure 3.5.
Before we are going to explain how to edit and use the notebook page, please note that the page shows the title of the notebook above a menu bar and a toolbar that provide access to the main operations and settings. Right now, the notebook is still called ‘Untitled...’, so, as a last preparation step, let’s rename the notebook by clicking on the title at the top and typing in ‘MyFirstJupyterNotebook’ as the new title and then clicking the ‘Rename’ button (Figure 3.6).
If you go back to the still open ‘Home’ tab with the file tree view in your browser, you can see your new notebook listed as MyFirstJupyterNotebook.ipynb and with a green ‘Running’ tag indicating that this notebook is currently open. You can also click on the ‘Running’ tab at the top to only see the currently opened notebooks (the ‘Clusters’ tab is not of interest for us at the moment). Since we created this notebook in the Juypter root folder, it will be located directly in your user’s home directory. However, you can move notebook files around in the Windows File Explorer if, for instance, you want the notebook to be in your Documents folder instead. To create a new notebook directly in a subfolder, you would first move to that folder in the file tree view before you click the ‘New…’ button.
We will now explain the basics of editing a Jupyter Notebook. We cannot cover all the details here, so if you enjoy working with Jupyter and want to learn all it has to offer as well as all the little tricks that make life easier, the following resources may serve as good starting points:
A Jupyter notebook is always organized as a sequence of so called ‘cells’ with each cell either containing some code or rich text created using the Markdown notation approach (further explained in a moment). The notebook you created in the previous section currently consists of a single empty cell marked by a blue bar on the left that indicates that this is the currently active cell and that you are in ‘Command mode’. When you click into the corresponding text field to add or modify the content of the cell, the bar color will change to green indicating that you are now in ‘Edit mode’. Clicking anywhere outside of the text area of a cell will change back to ‘Command mode’.
Let’s start with a simple example for which we need two cells, the first one with some heading and explaining text and the second one with some simple Python code. To add a second cell, you can simply click on the symbol. The new cell will be added below the first one and become the new active cell shown by the blue bar (and frame around the cell’s content). In the ‘Insert’ menu at the top, you will also find the option to add a new cell above the currently active one. Both adding a cell above and below the current one can also be done by using the keyboard shortcuts ‘A’ and ‘B’ while in ‘Command mode’. To get an overview on the different keyboard shortcuts, you can use Help -> Keyboard Shortcuts in the menu at the top.
Both cells that we have in our notebook now start with “In [ ]:” in front of the text field for the actual cell content. This indicates that these are ‘Code’ cells, so the content will be interpreted by Jupyter as executable code. To change the type of the first cell to Markdown, select that cell by clicking on it, then change the type from ‘Code’ to ‘Markdown’ in the dropdown menu in the toolbar at the top. When you do this, the “In [ ]:” will disappear and your notebook should look similar to Figure 3.8 below. The type of a cell can also be changed by using the keyboard shortcuts ‘Y’ for ‘Code’ and ‘M’ for ‘Markdown’ when in ‘Command mode’.
Let’s start by putting some Python code into the second(!) cell of our notebook. Click on the text field of the second cell so that the bar on the left turns green and you have a blinking cursor at the beginning of the text field. Then enter the following Python code:
1 2 3 4 5 6 7 8 9 | from bs4 import BeautifulSoup import requests html = requests.get(documentURL).text soup = BeautifulSoup(html, 'html.parser' ) print (soup.get_text()) |
This brief code example is similar to what you already saw in Lesson 2. It uses the requests Python package to read in the content of an html page from the URL that is provided in the documentURL variable. Then the package BeautifulSoup4 (bs4) is used for parsing the content of the file and we simply use it to print out the plain text content with all tags and other elements removed by invoking its get_text() method in the last line.
While Jupyter by default is configured to periodically autosave the notebook, this would be a good point to explicitly save the notebook with the newly added content. You can do this by clicking the disk symbol or simply pressing ‘S’ while in ‘Command mode’. The time of the last save will be shown at the top of the document, right next to the notebook name. You can always revert back to the last previously saved version (also referred to as a ‘Checkpoint’ in Jupyter) using File -> Revert to Checkpoint. Undo with CTRL-Z works as expected for the content of a cell while in ‘Edit mode’; however, you cannot use it to undo changes made to the structure of the notebook such as moving cells around. A deleted cell can be recovered by pressing ‘Z’ while in ‘Command mode’ though.
Now that we have a cell with some Python code in our notebook, it is time to execute the code and show the output it produces in the notebook. For this you simply have to click the run symbol button or press ‘SHIFT+Enter’ while in ‘Command mode’. This will execute the currently active cell, place the produced output below the cell, and activate the next cell in the notebook. If there is no next cell (like in our example so far), a new cell will be created. While the code of the cell is being executed, a * will appear within the squared brackets of the “In [ ]:”. Once the execution has terminated, the * will be replaced by a number that always increases by one with each cell execution. This allows for keeping track of the order in which the cells in the notebook have been executed.
Figure 3.9 below shows how things should look after you executed the code cell. The output produced by the print statement is shown below the code in a text field with a vertical scrollbar. We will later see that Jupyter provides the means to display other output than just text, such as images or even interactive maps.
In addition to running just a single cell, there are also options for running all cells in the notebook from beginning to end (Cell -> Run All) or for running all cells from the currently activated one until the end of the notebook (Cell -> Run All Below). The produced output is saved as part of the notebook file, so it will be immediately available when you open the notebook again. You can remove the output for the currently active cell by using Cell -> Current Outputs -> Clear, or of all cells via Cell -> All Output -> Clear.
Let’s now put in some heading and information text into the first cell using the Markdown notation. Markdown [5] is a notation and corresponding conversion tool that allows you to create formatted HTML without having to fiddle with tags and with far less typing required. You see examples of how it works by going Help -> Markdown in the menu bar and then clicking the “Basic writing and formatting syntax” link on the web page that opens up. This page here [6] also provides a very brief overview on the markdown notation. If you browse through the examples, you will see that a first level heading can be produced by starting the line with a hashmark symbol (#). To make some text appear in italics, you can delimit it by * symbols (e.g., *text*), and to make it appear in bold, you would use **text** . A simple bullet point list can be produced by a sequence of lines that start with a – or a *.
Let’s say we just want to provide a title and some bullet point list of what is happening in this code example. Click on the text field of the first cell and then type in:
# Simple example of reading a web page and converting it to plain text How the code works: * package **requests** is used to load web page from URL given in variable *documentURL* * package **BeautifulSoup4 (bs4)** is used to parse content of loaded web page * the call of *soup.get_text()* in the last line provides the content of page as plain text
While typing this in, you will notice that Jupyter already interprets the styling information we are providing with the different notations, e.g. by using a larger blue font for the heading, by using bold font for the text appearing within the **…**, etc. However, to really turn the content into styled text, you will need to ‘run the cell’ (SHIFT+Enter) like you did with the code cell. As a result, you should get the nicely formatted text shown in Figure 3.10 below that depicts our entire first Jupyter notebook with text cell, code cell, and output. If you want to see the Markdown code and edit it again, you will have to double-click the text field or press ‘Enter’ to switch to ‘Edit mode’.
If you have not worked with Markdown styling before, we highly recommend that you take a moment to further explore the different styling options from the “Basic writing and formatting syntax” web page. Either use the first cell of our notebook to try out the different notations or create a new Markdown cell at the bottom of the notebook for experimenting.
This little example only covered the main Jupyter operations needed to create a first Jupyter notebook and run the code in it. The ‘Edit’ menu contains many operations that will be useful when creating more complex notebooks, such as deleting, copying, and moving of cells, splitting and merging functionality, etc. For most of these operations, there also exist keyboard shortcuts. If you find yourself in a situation in which you can’t figure out how to use any of these operations, please feel free to ask on the forums.
Jupyter provides a number of so-called magic commands that can be used in code cells to simplify common tasks. Magic commands are interpreted by Jupyter and, for instance, transformed into Python code before the content is passed on to the kernel for execution. This happens behind the scenes, so you will always only see the magic command in your notebook. Magic commands start with a single % symbol if they are line-oriented meaning they should be applied to the remaining content of the line, and with %% if they are cell-oriented meaning they should be applied to the rest of the cell. As a first example, you can use the magic command %lsmagic to list the available magic commands (Figure 3.11). To get the output you have to execute the cell as with any other code cell.
The %load_ext magic command can be used for loading IPython extension which can add new magic commands. The following command loads the IPython rpy2 extension. If that code gives you a long list of errors then the rpy2 package isn't installed and you will need to go back to Section 3.2 and follow the instructions there.
We recently had cases where loading rpy2 failed on some systems due to the R_HOME environment variable not being set correctly. We therefore added the first line below which you will have to adapt to point to the lib\R folder in your AC Python environment.
1 2 3 | import os, rpy2 os.environ[ 'R_HOME' ] = r 'C:\Users\username\anaconda3\envs\AC37\lib\R' # workaround for R.dll issue occurring on some systems % load_ext rpy2.ipython |
Using a ? symbol in front of a magic command will open a subwindow with the documentation of that command at the bottom of the browser window. Give it a try by executing the command
1 | ? % R |
1 | ?BeautifulSoup |
or
1 | ?soup.get_text() |
Give it a try and see if you understand what the documentation is telling you.
Jupyter notebooks can also include interactive elements, referred to as widgets as in Lesson 2, like buttons, text input fields, sliders, and other GUI elements, as well as visualizations, plots, and animations. Figure 3.12 shows an example that places three button widgets and then simply prints out which button has been pressed when you click on them. The ipywidgets and IPython.display packages imported at the beginning are the main packages required to place the widgets in the notebook. We then define a function that will be invoked whenever one of the buttons is clicked. It simply prints out the description attribute of the button (b.description). In the for-loop we create the three buttons and register the onButtonClick function as the on_click event handler function for all of them.
1 2 3 4 5 6 7 8 9 10 | from ipywidgets import widgets from IPython.display import display def onButtonClick(b): print ( "Button " + b.description + " has been clicked" ) for i in range ( 1 , 4 ): button = widgets.Button(description = str (i)) display(button) button.on_click(onButtonClick) |
If you get an error with this code "Failed to display Jupyter Widget of type Button" that means the widgets are probably not installed which we can potentially fix in our Anaconda prompt:
conda install -n base -c conda-forge widgetsnbextension conda install -n AC37 -c conda-forge ipywidgets
After installing the packages, exit your Jupyter notebook and restart it and try to re-run your code. It's possible you will receive the error again as the widget tries to run before the Javascript library that runs the widgets has opened. In that case try to select your code, wait a few more seconds and then click Run.
If you're still getting an error, it's likely that your packages didn't install properly (or in a way that Jupyter/Anaconda could find them). The fix for this is to close Jupyter Notebook, return to Anaconda Navigator, click Environments (on the left), choose your environment and then search for "ipy", you may need to either change the "Installed" dropdown to "Not Installed" if they are missing or perhaps they should be updated (by clicking on the upward point arrow or the blue text).
It is easy to imagine how this example could be extended to provide some choices on how the next analysis step in a longer Data Science project should be performed. Similarly, a slider or text input field could be used to allow the notebook user to change the values of important input variables.
Let’s close this brief introduction to Jupyter with a few more things that are good to know when starting to write longer and more complex notebooks. Like normal development environments, Juypter has an autocomplete feature that helps with the code writing and can save a lot of typing: while editing code in a code cell, you can press the TAB key and Jupyter will either automatically complement the name or keyword you are writing or provide you with a dropdown list of choices that you can pick from. For instance, type in soup.ge and then press TAB and you get the list of options, as shown in Figure 3.13 including the get_text() function that we used in our code.
Another useful keyboard command to remember is SHIFT+TAB. When you place the cursor on a variable name or function call and press this key combination, a window will open up showing helpful information like the type of the variable and its current content or the parameters of the function as in Figure 3.14. This is of great help if you are unsure about the different parameters of a function, their order or names. Try out what you get when you use this key combination for different parts of the code in this notebook.
As in all programming, it may occasionally happen that something goes completely wrong and the execution of the code in a cell won’t terminate in the expected time or not at all. If that happens, the first thing to try is to use the “Interrupt Kernel” button located to the right of the “Execute cell” button. This should stop the execution of the current cell and you can then modify the code and try to run the cell again. However, sometimes even that won’t help because the kernel has become unresponsive. In that case, the only thing you can do is restart the kernel using the “Restart Kernel” button to the right of the “Interrupt Kernel” button. Unfortunately, that means that you will have to start the execution of the code in your notebook from the beginning because all imports and variable assignments will be lost after the restart.
Once you have finished your notebook, you may want to publish or share it. There are many options by wich to do so. In the File menu, there exists the “Download as…” option for obtaining versions of your notebook in different formats. The .ipynb format, as we mentioned, is the native format in which Jupyter saves the notebooks. If you make the .ipynb file available to someone who has access to Juypter, that person can open the notebook and run it or modify the content. The .py option allows for exporting content as a Python script, so that the code can be run outside of Jupyter. If you want a version of your notebook that others can read even without access to Jupyter, there are several options like exporting the notebook as HTML, as Latex, or as PDF. Some of these options require additional software tools to be installed on your computer and there are some limitations. For instance, if you export your notebook as HTML to add it to your personal web page, interactive widgets such as the interactive web map you will see later in Section 3.10 will not be included.
To close this section, we want to again refer to the links we provided at the beginning of Section 3.6.2 if you want to keep reading about Jupyter and learn tricks that we weren't able to cover in this brief introduction. In the remainder of this lesson, please use Jupyter to try out the code examples by entering them into a Jupyter notebook and running the code there to get some more practice with Jupyter.
Links
[1] http://jupyter.org/documentation
[2] https://jupyter-notebook.readthedocs.io/en/stable/
[3] https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/
[4] https://medium.com/datacamp/jupyter-notebook-tutorial-the-definitive-guide-660c7e651ecd
[5] https://daringfireball.net/projects/markdown/
[6] http://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html