GEOG 485:
GIS Programming and Software Development

4.2 Python Dictionaries

PrintPrint

In programming, we often want to store larger amounts of data that somehow belongs together inside a single variable. In Lesson 2, you already learned about lists, which provide one option to do so. As long as available memory permits, you can store as many elements in a list as you wish and the append(...) method allows you to add more elements to an existing list.

Dictionaries are another data structure that allows for storing complex information in a single variable. While lists store elements in a simple sequence and the elements are then accessed based on their index in the sequence, the elements stored in a dictionary consist of key-value pairs and one always uses the key to retrieve the corresponding values from the dictionary. It works like in a real dictionary, where you look up information (the stored value) under a particular keyword (the key).

Dictionaries can be useful to realize a mapping, for instance from English words to the corresponding words in Spanish. Here is how you can create such a dictionary for just the numbers from one to four:

In [1]: englishToSpanishDic = { "one": "uno", "two": "dos", "three": "tres", "four": "cuatro"  }

The curly brackets { } delimit the dictionary, similarly to how squared brackets [ ] do for lists. Inside the dictionary, we have four key-value pairs separated by commas. The key and value for each pair are separated by a colon. The key appears on the left of the colon, while the value stored under the key appears on the right side of the colon.

We can now use the dictionary stored in variable englishToSpanishDic to look up the Spanish word for an English number, e.g.

In [2]: print (englishToSpanishDic["two"])
dos

To retrieve some value stored in the dictionary, we here use the name of the variable followed by squared brackets containing the key under which the value is stored in the dictionary. If we use the same notation but on the left side of an assignment operator (=), we can add a new key-value pair to an existing dictionary:

In [3]: englishToSpanishDic["five"] = "cinco"    
In [4]: print (englishToSpanishDic)
{'four': 'cuatro', 'three': 'tres', 'five': 'cinco', 'two': 'dos', 'one': 'uno'}

We here added the value "cinco" appearing on the right side of the equal sign under the key "five" to the dictionary. If something would have already been stored under the key "five" in the dictionary, the stored value would have been overwritten. You may have noticed that the order of the elements of the dictionary in the output has changed, but that doesn’t matter since we always access the elements in a dictionary via their key. If our dictionary would contain many more word pairs, we could use it to realize a very primitive translator that would go through an English text word-by-word and replace each word by the corresponding Spanish word retrieved from the dictionary. Admittedly, using this simple approach would probably result in pretty hilarious translations.

Now let’s use Python dictionaries to do something a bit more complex. Let’s simulate the process of creating a book index that lists the page numbers on which certain keywords occur. We want to start with an empty dictionary and then go through the book page-by-page. Whenever we encounter a word that we think is important enough to be listed in the index, we add it and the page number to the dictionary.

To create an empty dictionary in a variable called bookIndex, we use the notation with the curly brackets but nothing in between:

In [5]: bookIndex = {}
In [6]: print (bookIndex)
{} 

Now, let’s say the first keyword we encounter in the imaginary programming book we are going through is the word "function" on page 2. We now want to store the page number 2 (value) under the keyword "function" (key) in the dictionary. But since keywords can appear on many pages, what we want to store as values in the dictionary are not individual numbers but lists of page numbers. Therefore, what we put into our dictionary is a list with the number 2 as its only element:

In [7]: bookIndex["function"] =  [2]
In [8]: print (bookIndex)
{'function': [2]} 

Next, we encounter the keyword "module" on page 3. So, we add it to the dictionary in the same way:

In [9]: bookIndex["module"] =  [3]
In [10]: print (bookIndex)
{'function': [2], 'module': [3]}

So now our dictionary contains two key-value pairs, and for each key it stores a list with just a single page number. Let’s say we next encounter the keyword “function” a second time, this time on page 5. Our code to add the additional page number to the list stored under the key “function” now needs to look a bit differently because we already have something stored for it in the dictionary, and we do not want to overwrite that information. Instead, we retrieve the currently stored list of page numbers and add the new number to it with append(…):

In [11]: pages = bookIndex["function"] 
In [12]: pages.append(5)
In [13]: print (bookIndex)
{'function': [2, 5], 'module': [3]}
In [14]: print (bookIndex["function"])
[2, 5]

Please note that we didn’t have to put the list of page numbers stored in variable pages back into the dictionary after adding the new page number. Both, variable pages and the dictionary refer to the same list such that appending the number changes both. Our dictionary now contains a list of two page numbers for the key “function” and still a list with just one page number for the key “module”. Surely, you can imagine how we would build up a large dictionary for the entire book by continuing this process. Dictionaries can be used in concert with a for loop to go through the keys of the elements in the dictionary. This can be used to print out the content of an entire dictionary:

In [15]: for k in bookIndex:  # loop through keys of the dictionary
...:    print ("keyword: " + k)                # print the key
...:    print ("pages: " + str(bookIndex[k]))  # print the value
...:
keyword: function
pages: [2, 5]
keyword: module
pages: [3]

When adding the second page number for “function”, we ourselves decided that this needs to be handled differently than when adding the first page number. But how could this be realized in code? We can check whether something is already stored under a key in a dictionary using an if-statement together with the “in” operator:

In [16]: keyword = "function"
In [17]: if keyword in bookIndex: 
...:    print ("entry exists")
...: else:
...:    print ("entry does not exist")
...: 
entry exists

So assuming we have the current keyword stored in variable word and the corresponding page number stored in variable pageNo, the following piece of code would decide by itself how to add the new page number to the dictionary:

word = "module"
pageNo = 7

if word in bookIndex:
	# entry for word already exists, so we just add page
	pages = bookIndex[word]	
	pages.append(pageNo)
else:
	# no entry for word exists, so we add new entry
	bookIndex[word] = [pageNo] 

A more sophisticated version of this code would also check whether the list of page numbers retrieved in the if-block already contains the new page number to deal with the case that a keyword occurs more than once on the same page. Feel free to think about how this could be included.

Readings

Read Zandbergen section 4.17 on using Python dictionaries.