
Data Structures: Pandas
The previous page discussed data structures in general, using common terminology. Here, we focus on two data structures common to the Pandas library:
1. Series
There is a specific data structure in the Pandas called a series. This is very similar to a vector because it is in one-dimensional, but the key difference is that a series is labeled. For an example, run the following code:
So, the numbers 1-5 are our data in this example, whereas a-e, defined in the parameter “index
”, are our indices or labels. These labels are handy for retrieving specific values out of the Series; try adding print(s['c']
) to the above code and see what happens. This should return the value 3 for you.
If we didn't supply the “index
” parameter, the default is to number the indices with sequential integers starting at 0.
2. DataFrame
Similarly, Pandas also has a table-style data structure, called a “DataFrame”. Typically, a DataFrame will have column labels (the same as the headings in a table, and these are treated effectively as variable names), and also row labels or indices (think of these as row numbers). So, for example, if we want to coerce the table from the previous page into a Pandas DataFrame:
Note that the first variable, HOME ID, is entered as an array (values separated by commas and encased in [ ]), whereas the other two variables are entered as Series. This is just to show some of the various ways you can enter data. Furthermore, as with the Series function, we can customize our row indices by supplying an “index
” parameter to the DataFrame fuction.
Throughout this course, we'll be working a lot with DataFrames. This will be our default structure for datasets. The next few pages will show you how to import data into a DataFrame object in Python. In Lesson 2, we will cover some common tools for manipulating data into a DataFrame. It is critical to feel comfortable with DataFrames, as they are an incredibly useful way to store data and essential for any Pandas-based data analytics.