NGA Advanced Python Programming for GIS, GLGI 3001-1

Python Packages for (spatial) Data Science

PrintPrint

It would be impossible to introduce or even just list all the packages available for conducting spatial data analysis projects in Python here, so the following is just a small selection of those that we consider most important.

numpy

numpy Python numpy page(link is external)Wikipedia numpy page(link is external) stands for “Numerical Python” and is a library that adds support for efficiently dealing with large and multi-dimensional arrays and matrices to Python together with a large number of mathematical operations to apply to these arrays, including many matrix and linear algebra operations. Many other Python packages are built on top of the functionality provided by numpy.

matplotlib

matplotlib Python matplotlib page(link is external), Wikipedia matplot page(link is external) is an example of a Python library that builds on numpy. Its main focus is on producing plots and embedding them into Python applications. Take a quick look at its Wikipedia page to see a few examples of plots that can be generated with matplotlib. We will be using matplotlib a few times in this lesson’s walkthrough to quickly create simple map plots of spatial data.

SciPy

SciPy Python SciPy page(link is external), Wikipedia SciPy page(link is external) is a large Python library for application in mathematics, science, and engineering. It is built on top of both numpy and matplotlib, providing methods for optimization, integration, interpolation, signal processing and image processing. Together numpy, matplotlib, and SciPy roughly provide a similar functionality as the well known software Matlab. While we won’t be using SciPy in this lesson, it is definitely worth checking out if you're interested in advanced mathematical methods.

pandas

pandas Python pandas page(link is external), Wikipedia pandas software page(link is external) provides functionality to efficiently work with tabular data, so-called data frames, in a similar way as this is possible in R. Reading and writing tabular data, e.g. to and from .csv files, manipulating and subsetting data frames, merging and joining multiple data frames, and time series support are key functionalities provided by the library. A more detailed overview on pandas will be given in the upcoming section. 

Shapely

Shapely Python Shapely page(link is external), Shapely User Manual(link is external) adds the functionality to work with planar geometric features in Python, including the creation and manipulation of geometries such as points, polylines, and polygons, as well as set-theoretic analysis capabilities (intersection, union, …). It is based on the widely used GEOS(link is external) library, the geometry engine that is used in PostGIS(link is external), which in turn is based on the Java Topology Suite(link is external) (JTS) and largely follows the OGC’s Simple Features Access Specification(link is external).

geopandas

geopandas Python geopandas page(link is external), GeoPandas page(link is external) combines pandas and Shapely to facilitate working with geospatial vector data sets in Python. While we will mainly use it to create a shapefile from Python, the provided functionality goes significantly beyond that and includes geoprocessing operations, spatial join, projections, and map visualizations.

GDAL/OGR

GDAL/OGR Python GDAL page(link is external), GDAL/OGR Python(link is external) is a powerful library for working with GIS data in many different formats widely used from different programming languages. Originally, it consisted of two separated libraries, GDAL (‘Geospatial Data Abstraction Library‘) for working with raster data and OGR (used to stand for ‘OpenGIS Simple Features Reference Implementation’) for working with vector data, but these have now been merged. The gdal Python package provides an interface to the GDAL/OGR library written in C++.

ArcGIS API for Python

As we already mentioned in the last lesson, Esri provides its own Python API ArcGIS for Python page(link is external) for working with maps and GIS data via their ArcGIS Online and Portal for ArcGIS web platforms. The API allows for conducting administrative tasks, performing vector and raster analyses, running geocoding tasks, creating map visualizations, and more. While some services can be used autonomously, many are tightly coupled to Esri’s web platforms and you will at least need a free ArcGIS Online account. The Esri API for Python will be further discussed in Lesson 4.

Lesson content developed by Jan Wallgrun and James O’Brien