NGA Advanced Python Programming for GIS, GLGI 3001-1

Reading Through Records

PrintPrint

Let’s examine how to read up and down through the table records within Featureclasses and tables. 

The arcpy module contains some objects called cursors that allow you to move through records in a table. At version 10.1 of ArcMap, Esri released a new data access module, which offered faster performance along with more robust behavior when crashes or errors were encountered with the cursor.  The module contains different cursor classes for the different operations a developer might want to perform on table records -- one for selecting, or reading, existing records; one for making changes to existing records; and one for adding new records.  We'll discuss the editing cursors later, focusing our discussion now on the cursor used for reading tabular data, the search cursor.  

As with all geoprocessing classes, the SearchCursor class is documented in the Help system.  (Be sure when searching the Help system that you choose the SearchCursor class found in the Data Access module.  An older, pre-10.1 class is available through the arcpy module and appears in the Help system as an "ArcPy Function.")  

The common workflow for reading data with a search cursor is as follows:

  1. Create the search cursor. This is done through the method arcpy.da.SearchCursor(). This method takes several parameters in which you specify which dataset and, optionally, which specific rows you want to read as a sql filter, and some limited sorting functions.
  2. Set up a loop that will iterate through the rows until there are no more available to read.
  3. Within the loop, do something with the values in the current row.

Here's a very simple example of a search cursor that reads through a point dataset of cities and prints the name of each.

# Prints the name of each city in a feature class

import arcpy

featureClass = r"C:\Data\USA\USA.gdb\Cities"

with arcpy.da.SearchCursor(featureClass,["NAME"]) as cursor:
    for row in cursor:
        print (row[0])

Important points to note in this example:

  • The cursor is created using a "with" statement. Although the explanation of "with" is somewhat technical, the key thing to understand is that it allows your cursor to exit the dataset gracefully, whether it crashes or completes its work successfully. This is a big issue with cursors, which can sometimes maintain locks on data if they are not exited properly.
  • The "with" statement requires that you indent all the code beneath it. After you create the cursor in your "with" statement, you'll initiate a for loop to run through all the rows in the table. This requires additional indentation.
  • Creating a SearchCursor object requires specifying not just the desired table/feature class, but also the desired fields within it, as a list.  Supplying this list speeds up the work of the cursor because it does not have to deal with the potentially dozens of fields included in your dataset. In the example above, the list contains just one field, "NAME".
  • The "with" statement creates a SearchCursor object, and declares that it will be named "cursor" in any subsequent code. 
  • A for loop is used to iterate through the SearchCursor (using the "cursor" name assigned to it).  Just as in a loop through a list, the iterator variable can be assigned a name of your choice (here, it's called "row" to represent the feature as a “row” of values).
  • Retrieving values out of the rows is done using the index position of the field name in the tuple you submitted when you created the object. Since the above example submits only one item in the tuple, then the index position of "NAME" within that tuple is 0 (remember that we start counting from 0 in Python). 

Here's another example where something more complex is done with the row values. This script finds the average population for counties in a dataset. To find the average, you need to divide the total population by the number of counties. The code below loops through each record and keeps a running total of the population and the number of records counted. Once all the records have been read, only one line of division is necessary to find the average. You can download the sample data for this script.

# Finds the average population in a county dataset
import arcpy

featureClass = r"C:\Data\Pennsylvania\Counties.shp"
populationField = "POP1990"
nameField = "NAME"

average = 0
totalPopulation = 0
recordsCounted = 0

print("County populations:")

with arcpy.da.SearchCursor(featureClass, [nameField, populationField]) as countiesCursor:
    for row in countiesCursor:
        print (row[0] + ": " + str(row[1]))
        totalPopulation += row[1]

        recordsCounted += 1

average = totalPopulation / recordsCounted

print ("Average population for a county is " + str(average))

Differences between this example and the previous one:

  • The field list includes multiple fields, with their names having been stored in variables near the top of the script. 
  • The SearchCursor object goes by the variable name countiesCursor rather than cursor.  It is good practice to indicate what type of cursor you are creating and a simple method of denoting this is to use the first letter of the type of cursor you create.  For example,  
sCur = SearchCursor
uCur = UpdateCursor
iCur = InsertCursor

Before moving on, you should note that cursor objects have a couple of methods that you may find helpful in traversing their associated records.  To understand what these methods do, and to better understand cursors in general, it may help to visualize the attribute table with an arrow pointing at the "current row." When a cursor is first created, that arrow is pointing just above the first row in the table. When a cursor is included in a for loop, as in the above examples, each execution of the for statement moves the arrow down one row and assigns that row's values to the row variable.  If the for statement is executed when the arrow is pointing at the last row, there is not another row to advance to and the loop will terminate.  (The row variable will be left holding the last row's values.) 

SearchCursors are read only and return the attributes as a tuple of values.  Remember that tuples are immutable, and you will not be able to assign new values to this tuple “row”. 

Imagine that you wanted to iterate through the rows of the cursor a second time.  It is best practice to limit nested iterations of the same dataset, and to store the values in a dictionary that could be referenced in the second loop. But for sake of exploring that it can be done, if you were to modify the Cities example above, adding a second loop immediately after the first, you'd see that the second loop would never "get off the ground" because the cursor's internal pointer is still left pointing at the last row.  To deal with this problem, you could just re-create the cursor object.  However, a simpler solution would be to call on the cursor's reset() method. For example:

cursor.reset()

This will cause the internal pointer (the arrow) to move just above the first row again, enabling you to loop through its rows again.

The other method supported by cursor objects is the next() method, which allows you to retrieve rows without using a for loop.  Returning to the internal pointer concept, a call to the next() method moves the pointer down one row and returns the row's values (again, as a tuple).  For example:

row = cursor.next()

An alternative means of iterating through all rows in a cursor is to use the next() method together with a while loop.  Here is the original Cities example modified to iterate using next() and while:

# Prints the name of each city in a feature class (using next() and while)

import arcpy

featureClass = r"C:\Data\USA\USA.gdb\Cities"

with arcpy.da.SearchCursor(featureClass,("NAME")) as cursor:
    try:
        row = cursor.next()
        while row:
            print (row[0])
            row = cursor.next()
    except StopIteration:
        pass

Points to note in this script:

  • This approach requires invoking next() both prior to the loop and within the loop.
  • Calling the next() method when the internal pointer is pointing at the last row will raise a StopIteration exception.  The example prevents that exception from displaying an ugly error.

You should find that using a for loop is usually the better approach, and in fact, you won't see the next() method even listed in the documentation of arcpy.da.SearchCursor.  We're pointing out the existence of this method because a) older ArcGIS versions have cursors that can/must be traversed in this way, so you may encounter this coding pattern if looking at older scripts, and b) you may want to use the next() method if you're in a situation where you know the cursor will contain exactly one row (or you're interested in only the first row if you have applied some sort of sorting on the results). 

Lesson content developed by Jim Detwiler, Jan Wallgrun and James O’Brien