Again, we are going to close out the lesson with a few practice exercises that focus on the new Python concepts introduced in this lesson (regular expressions and higher order functions) as well as on working with tabular data with pandas as a preparation for this lesson's homework assignment. In the homework assignment, you are also going to use geopandas, the Esri ArcGIS for Python API, and GDAL/OGR again to get some more practice with these libraries, too. What was said in the introduction to the practice exercises of Lesson 2 holds here as well: don't worry if you have troubles finding the perfect solution on your own. Studying the solutions carefully is another way of learning and improving your skills. The solutions of the three practice exercises pages can again be found in the following subsections.
Practice Exercise 1: Regular Expressions (see Section 3.3)
Write a function that tests whether an entered string is a valid date using the format "YYYY-MM-DD". The function takes the string to test as a parameter and then returns True or False. The YYYY can be any 4-digit number, but the MM needs to be a valid 2-digit number for a month (with a leading 0 for January to September). The DD needs to be a number between 01 and 31 but you don’t have to check whether this is a valid number for the given month. Your function should use a single regular expression to solve this task.
Here are a few examples you can test your implementation with:
"1977-01-01" -> True "1977-00-01" -> False (00 not a valid month) "1977-23-01" -> False (23 not a valid month) "1977-12-31" -> True "1977-11-01asdf" -> False (you need to make sure there are no additional characters after the date) "asdf1977-11-01" -> False (you need to make sure there are no additional characters before the date) "9872-12-31" -> True "0000-12-33" -> False (33 is not a valid day) "0000-12-00" -> False (00 not a valid day) "9872-15-31" -> False (15 is not a valid month)
Practice Exercise 2: Higher Order Functions (see Section 3.4)
We mentioned that the higher-order function reduce(...) can be used to do things like testing whether all elements in a list of Booleans are True. This exercise has three parts:
- Given list l containing only Boolean values as elements (e.g. l = [ True, False, True ]), use reduce(…) to test whether all elements in l are True? What would you need to change to test if at least one element is True? (Hint: you will have to figure out what the right logical operator to use is and then look at how it’s called in the Python module operator; then figure out what the right initial value for the third parameter of reduce(...) is.)
- Now instead of a list of Booleans, you have a list of integer numbers (e.g. l =[-4, 2, 1, -6 ]). Use a combination of map(…) and reduce(…) to check whether or not all numbers in the list are positive numbers (> 0).
- Implement reduce(...) yourself and test it with the example from part 1. Your function myReduce(…) should have the three parameters f (function), l (list), and i (initial value). It should consist of a for-loop that goes through the elements of the list and it is not allowed to use any other higher order function (in particular not the actual reduce(...) function).
Practice Exercise 3: Pandas (see Section 3.8)
Below is an imaginary list of students and scores for three different assignments.
Name | Assignment 1 | Assignment 2 | Assignment 3 | |
---|---|---|---|---|
1 | Mike | 7 | 10 | 5.5 |
2 | Lisa | 6.5 | 9 | 8 |
3 | George | 4 | 3 | 7 |
4 | Maria | 7 | 9.5 | 4 |
5 | Frank | 5 | 5 | 5 |
Create a pandas data frame for this data (e.g. in a fresh Jupyter notebook). The column and row labels should be as in the table above.
Now, use pandas operations to add a new column to that data frame and assign it the average score over all assignments for each row.
Next, perform the following subsetting operations using pandas filtering with Boolean indexing:
- Get all students with an Assignment 1 score < 7 (show all columns)
- Get all students with Assignment 1 and Assignment 2 scores both > 6 (show all columns)
- Get all students with at least one score < 5 over all assignments (show all columns)
(Hint: an alternative to using the logical or (|) over all columns with scores is to call the .min(…) method of a data frame with the parameter "axis = 1" to get the minimum value over all columns for each row. This can be used here to first create a vector with the minimum score over all three assignments and then create a Boolean vector from it based on whether or not the value is <5. You can then use this vector for the Boolean indexing operation.)
- Get all students whose names start with 'M' and only show the name and average score columns
(Hint: there is also a method called .map(…) that you can use to apply a function or lambda expression to a pandas data frame (or individual column). The result is a new data frame with the results of applying the function/expression to each cell in the data frame. This can be used here to create a Boolean vector based on whether or not the name starts with ‘M’ (string method startswith(…)). This vector can then be used for the Boolean indexing operation. Then you just have to select the columns of interest with the last part of the statement).
- Finally, sort the table by name.