NGA Advanced Python Programming for GIS, GLGI 3001-1

Multiprocessing

PrintPrint

Python includes several different syntaxial ways of executing processes in parallel. Each way comes with a list of pros and cons. The major hinderance with Python’s multithreading is the serialization and deserialization of data to and from the threads. This is called ‘pickling’ and will be discussed more in detail later in the section. It is important to note that custom classes, such as those found in arcpy (geoprocessing results, Featureclasses or layers) will need a custom serializer and de-serializer for geoprocessing results to be returned from threads. This takes significant work and coding to create and I have yet to see one.  Trying to return a object outside of the built in types will result in an Exception that the object cannot be pickled. The method of multiprocessing that we will be using utilizes the map method that we covered earlier in the lesson as a starmap(), or you can think of it as ‘start map’.  The method starts a new thread for each item in the list and holds the results from each process in a list.

What if you wanted to run different scripts at the same time? The starmap() method is great for a single process done i number of times but you can also be more explicit by using the pool.apply_async() method. Instead of using the map construct, you assign each process to a variable and then call .get() for the results. Note here that the parameters need to be passed as a tuple. Single params need to be passed as (arg,), but if you have more than one parameter to pass, the tuple is (arg1, arg2, arg3).

For example:

with mp.Pool(processes=5) as pool: 
    p1 = pool.apply_async(scriptA, (1param,)) 
    p2 = pool.apply_async(scriptB, (1param, 2param)) 
    p3 = pool.apply_async(scriptC, (1param,)) 
 
    res = [p1.get(), p2.get(), p3.get(), …]