NGA Advanced Python Programming for GIS, GLGI 3001-1

Arcpy multiprocessing examples

PrintPrint

Now that we have completed a non-ArcGIS parallel processing exercise, let's look at a couple of examples using ArcGIS functions. There are several caveats or gotchas to using multiprocessing with ArcGIS and it is important to cover them up-front because they affect the ways in which we can write our code. 

Esri describes several best practices for multiprocessing with arcpy. These include: 

  • Use “memory“ workspaces to store temporary results because as noted earlier memory is faster than disk. 
  • Avoid writing to file geodatabase (FGDB) data types and GRID raster data types. These data formats can often cause schema locking or synchronization issues. That is because file geodatabases and GRID raster types do not support concurrent writing – that is, only one process can write to them at a time. You might have seen a version of this problem in arcpy previously if you tried to modify a feature class in Python that was open in ArcGIS. That problem is magnified if you have an FGDB and you’re trying to write many feature classes to it at once. Even if all of the feature classes are independent you can only write them to the FGDB one at a time. 
  • Use 64-bit. This isn’t an issue if we are writing code in ArcGIS Pro (although Esri does recommend using a version of Pro greater than 1.4) because we are already using 64-bit, but if you were planning on using Desktop as well, then you would need to use ArcGIS Server 10.5 (or greater) or ArcGIS Desktop with Background Geoprocessing (64-bit). The reason for this is that as we previously noted 64-bit processes can access significantly more memory and using 64-bit might help resolve any large data issues that don’t fit within the 32-bit memory limits of 4GB. 

So bearing the top two points in mind we should make use of memory workspaces wherever possible and we should avoid writing to FGDBs (in our worker functions at least – but we could use them in our master function to merge a number of shapefiles or even individual FGDBs back into a single source). 

Since we work with other packages such as arcpy, it is important to note that Classes within arcpy such as the Featureclass, Layer, Table, Raster, etc.,. cannot be returned from the worker threads without creating custom serializers to serialize and deserialize the objects between threads. This is known as Pickling and is the process of converting the object to JSON and back to an object. This method is beyond the scope of this course but built in classes and objects within python can be returned. For our example, we will return a dictionary containing information of the process.  

Lesson content developed by Jan Wallgrun and James O’Brien