Part 1- IDE Research (25% of Project 1 score)
For the first part of the Lesson 1 homework project, you will be evaluating an IDE. Each student will be evaluating a different IDE and can “claim” their IDE in the "L1: IDE Investigation: Choose topic" discussion forum within Canvas. Possible IDEs include but are not limited to the following (please do NOT choose Spyder!):
- PyScripter
- PyCharm
- Visual Studio with Python plugin
- Eric
- PyDev
- Wing
- Notepad++ with Python plugin
Part I Deliverable
First, claim your IDE in the "L1: IDE Investigation: Choose topic" discussion forum. Then experiment with writing and debugging code in that IDE and study the documentation. Pay special attention to which of the features mentioned in Section 1.9 (auto-completion, syntax checking, version control, environment control, and project organization) are available in that IDE. Record a 5 minute demo and discussion video of your chosen IDE using Kaltura that highlights the IDE’s features, functionalities, and possible difficulties. Post a link to your video in the Media Gallery.
Part 2 – Python coding and profiling (75% of Project 1 score)
We are going to use the arcpy vector data processing code from Section 1.6.6.2 (download Lesson1_Assignment_initial_code.py) as the basis for our Lesson 1 programming project. The code is already in multiprocessing mode, so you will not have to write multiprocessing code on your own from scratch but you still will need a good understanding of how the script works. If you are unclear about anything the script does, please ask on the course forums. This part of the assignment will be for getting back into the rhythm of writing arcpy based Python code and practice creating script tool with ArcGIS Pro. Your task is to extend our vector data clipping script by doing the following:
- Modify the code to handle a parameterized output folder path (still using unique output filenames for each shapefile) defined in a third input variable at the beginning of the main script file. One way to achieve this task is by adding another (5th) parameter to the worker() function to pass the output folder information along with the other data.
- Implement and run simple code profiling using the time module as in Section 1.6 and then perform basic profiling in spyder as we did in Section 1.7.2.1 (no visual or line profiling needed). You won't be able to get profiling results for the subprocesses running the worker() function from this but you should report the total time and the computation times you get for the main functions from scriptool.py involved in your write-up and explain where the most time has been spent. Also include a screenshot showing the profiling results in spyder.
- Create an ArcGIS Pro script tool for running the modified code. The script tool should have three parameters allowing the user to provide the clipper feature class, the to-be-clipped feature class, and the output folder.
- Expand the code so that it can handle multiple input feature classes to be clipped (still using a single polygon clipping feature class). The input variable tobeclipped should now take a list of feature class names rather than a single name. The worker function should, as before, perform the operation of clipping a single input file (not all of them!) to one of the features from the clipper feature class. The main change you will have to make will be in the main code where the jobs are created. The names of the output files produced should have the format
clip_<oid>_<name of the input feature class>.shp
, so for instance clip_0_Roads.shp for clipping the Roads feature class from USA.gdb to the state with oid 0. Do this after the profiling stage and you do not need to run profiling for this nor create a script tool for this modified version, so you may want to replace the calls of GetParamterAsText() by hardcoded paths again. - Successful delivery of the above requirements is sufficient to earn 90% on the project. The remaining 10% is reserved for efforts that go "over and above" the minimum requirements. Over and above points may be earned by sharing your profiling results only (not the code and not the other parts of your write-up!) by uploading them to GitHub, adding in-tool documentation, creating a script tool for the multiple-input-files version from step (4), adding further geoprocessing operations (e.g. reprojection) to the worker() function, or other enhancements as you see fit. You can also try to improve the efficiency of the code based on the profiling results.
You will have to submit several versions of the modified script for this assignment:
- (A) The modified single-input-file script tool version from step (3) above together with the .tbx file for your toolbox.
- (B) The multiple-input-files version from step (4).
- (C) Potentially a third version if you made substantial modifications to the code for "over and above" points (step (5) above). If you created a new script tool for this, make sure to include the .tbx file as well.
To realize the modified code versions in this assignment, all main modifications have to be made to the input variables and within the code of the worker() and mp_handler() functions; the code from the get_install_path() function should be left unchanged. Of course, we will also look at code quality, so make sure the code is readable and well documented. Here are a few more hints that may be helpful:
Hint 1:
When you adapt the worker() function, I strongly recommend that you do some tests with individual calls of that function first before you run the full multiprocessing version. For this, you can, for instance, comment out the pool code and instead call worker() directly from the loop that produces the job list, meaning all calls will be made sequentially rather than in parallel. This makes it easier to detect errors compared to running everything in multiprocessing mode right away. Similarly, it could be a good idea to add print statements for printing out the parameter tuples placed in the job list to make sure that the correct values will be passed to the worker function.
Hint 2 (concerns step (4)):
When changing to the multiple-input-files version, you will not only have to change the code that produces the name of the output files in variable outFC by incorporating the name of the input feature class, you will have to do the same for the name of the temporary layer that is being created by MakeFeatureClass_managment() to make sure that the layer names remain unique. Else some worker calls will fail because they try to create a layer with a name that is already in use.
To get the basename of a feature class without file extension, you can use a combination of the os.path.basename() and os.path.splitext() functions defined in the os module of the Python standard library. The basename() function will remove the leading path (so e.g., turn "C:\489\data\Roads.shp" into just "Roads.shp"). The expression os.path.splitext(filename)[0] will give you the filename without file extension. So for instance "Roads.shp" will become just "Roads". (Using [1] instead of [0] will give you just the file extension but you won't need this here.)
Hint 3 (concerns steps (4) and (5)):
This is not required but if you decide to create a script tool for the multiple-input-files version from step (4) for over and above points, you will have to use the "Multiple value" option for the input parameter you create for the to-be-clipped feature class list in the script tool interface. If you then use GetParameterAsText(...) for this parameter in your code, what you will get is a single string(!) with the names of the feature classes the user picked separated by semicolons, not a list of name strings. You can then either use the string method split(...) to turn this string into a list of feature class names or you use GetParameter(...) instead of GetParameterAsText(...) which will directly give you the feature class names as a list.
Part 2 Deliverable
Submit a single .zip file to the corresponding drop box on Canvas; the zip file should contain:
- Your modified code files and ArcGIS Pro toolbox files (up to three different versions as described above). Please organize the files cleanly, e.g., using a separate subfolder for each version.
- A 400-word write-up of what you have learned during this exercise. This write-up should also include your profiling results and insights (including the spyder profiling screenshot) and a description of what you did for "over and above" points (if anything). In addition, think back to the beginning of Section 1.6.6 and include a brief discussion of any changes to the processing workflow and/or the code that might be necessary if we wanted to write our output data to geodatabases and briefly comment on possible issues (using pseudocode or a simple flowchart if you wish).