NGA Advanced Python Programming for GIS, GLGI 3001-1

Lesson 2: Open Source Data

PrintPrint

Hopefully lesson 1 wasn’t too hard and you learned something new.  Lesson 2 will take a look at how to use Python to retrieve data in many different formats and translate it to other formats. This process is known by an acronym ETL, or Extract, Translate, and Load and is a large part of any script that works with data. Once you get the hang of working through data structures in Python, it opens up many possibilities for task automation and efficiency improvements.  

This lesson is more relaxed than the first, but still covers many topics. You will learn how to read and parse a CSV, access data in multiple database flavors, parse KML’s and KMZ's, convert structured data such as featureclasses to dictionaries, and lastly, how to use python to retrieve open source data from the web. 

We will also take a look at ESRI’s GIS module from their ArcGIS for Python API. This portion was designed for Jupyter Notebooks, but the code can be ran in PyScripter as a script. If you are interested in stepping through the material within a Notebook, I encourage you to reference the 3rd lesson in Penn State's full GEOG 489 course. The reason we are not pursuing Notebook here, is that the environment can take a very long time to set up and involve a lot of troubleshooting if it does not resolve.

This is not required for this lesson, but a simple and fast way of accessing a Jupyter Notebook through ArcGIS Pro can be done follwoing these instructions How to use ArcGIS Notebooks in ArcGIS Pro. Note that we will be going over Jupyter Notebooks in lesson 4.