
3.8.4 Adding / removing columns and rows
Adding a new column to a data frame is very simple when you have the values for that column ready in a list. For instance, in the following example, we want to add a new column ‘m5’ with additional measurements and we already have the numbers stored in a list m5values that is defined in the first line of the example code. To add the column, we then simply make an assignment to df['m5'] in the second line. If a column ‘m5’ would already exist, its values would now be overwritten by the values from m5values. But since this is not the case, a new column gets added under the name ‘m5’ with the values from m5values.
1 2 3 | m5values = [ 0.432523 , - 0.123223 , - 0.231232 , 0.001231 , - 0.23698 , - 0.41231 ] df[ 'm5' ] = m5values df |
m1 | m2 | m3 | m4 | m5 | |
---|---|---|---|---|---|
2017-01-01 | 1.200000 | 0.163613 | 0.510162 | 0.628612 | 0.432523 |
2017-01-02 | 0.056027 | 0.056027 | 0.025050 | 0.283586 | -0.123223 |
2017-01-03 | -0.840010 | -0.840010 | -0.422343 | 1.022622 | -0.231232 |
2017-01-04 | -0.721431 | -0.721431 | -0.966351 | -0.380911 | 0.001231 |
2017-01-05 | 1.200000 | 0.655267 | -1.339799 | 1.075069 | -0.236980 |
2017-01-06 | 0.192804 | 0.192804 | -1.160902 | 0.525051 | -0.412310 |
For adding new rows, we can simply make assignments to the rows selected via the loc operation, e.g. we could add a new row for January 7, 2017 by writing
1 | df.loc[pd.Timestamp( '2017-01-07' ),:] = [ ... ] |
where the part after the equal sign is a list of five numbers, one for each of the columns. Again, this would replace the values in the case that there already is a row for January 7. The following example uses this idea to create new rows for January 7 to 9 using a for loop:
1 2 3 | for i in range ( 7 , 10 ): df.loc[ pd.Timestamp( '2017-01-0' + str (i)),:] = [ np.random.rand() for j in range ( 5 ) ] df |
m1 | m2 | m3 | m4 | m5 | |
---|---|---|---|---|---|
2017-01-01 | 1.200000 | 0.163613 | 0.510162 | 0.628612 | 0.432523 |
2017-01-02 | 0.056027 | 0.056027 | 0.025050 | 0.283586 | -0.123223 |
2017-01-03 | -0.840010 | -0.840010 | -0.422343 | 1.022622 | -0.231232 |
2017-01-04 | -0.721431 | -0.721431 | -0.966351 | -0.380911 | 0.001231 |
2017-01-05 | 1.200000 | 0.655267 | -1.339799 | 1.075069 | -0.236980 |
2017-01-06 | 0.192804 | 0.192804 | -1.160902 | 0.525051 | -0.412310 |
2017-01-07 | 0.768633 | 0.559968 | 0.591466 | 0.210762 | 0.610931 |
2017-01-08 | 0.483585 | 0.652091 | 0.183052 | 0.278018 | 0.858656 |
2017-01-09 | 0.909180 | 0.917903 | 0.226194 | 0.978862 | 0.751596 |
In the body of the for loop, the part on the left of the equal sign uses loc(...) to refer to a row for the new date based on loop variable i, while the part on the right side simply uses the numpy rand() method inside a list comprehension to create a list of five random numbers that will be assigned to the cells of the new row.
If you ever want to remove columns or rows from a data frame, you can do so by using df.drop(...). The first parameter given to drop(...) is a single column or row name or, alternatively, a list of names that should be dropped. By default, drop(...) will consider these as row names. To indicate these are column names that should be removed, you have to specify the additional keyword argument axis=1 . We will see an example of this in a moment.