NGA Advanced Python Programming for GIS, GLGI 3001-1

KML and KMZ

PrintPrint

A friendly competitor or companion to the shapefile format is the KML or Keyhole Markup Language and it is used to display geographic data in several different GIS software suites. Some sample KML and KMZ files can be downloaded. KML organizes the different features into placemarks, paths, and polygons and groups like features into these categories.  When you import a KML into GIS, you will see these three shapefiles, even if there are different features with different attributes. For example, if you have a KML of hotels and soccer field locations, these would be combined within the placemark features. Esri provides some conversion of KML’s but they do not offer the ability to also dissolve on an attribute (separating hotels from soccer fields) during the import. This becomes a multistep process if you want your data compartmentalized by feature type.

There are python packages that provide tools to access the feature’s data and parse them into familiar featureclass or geojson structures that are worth knowing about. One package that is worth knowing is kml2geojson, which extracts the KML data into a geojson format. The steps are mostly the same, but involves writing the JSON to a file in order for esri's JSONToFeatureClass method to read it.

from zipfile import ZipFile
import kml2geojson
import os
import json

outDir = r'C:\NGA\kml kmz data\kml kmz data'
kmz = os.path.join(outDir, r'CurtGowdyStatePark.kmz')

# extract the kml file
with ZipFile(kmz) as kmzFile:
    # extract files
    kmzFile.extractall(outDir)

# convert the doc.kml to a geojson object
kml_json = kml2geojson.main.convert(fr'{outDir}\doc.kml', r'curt_gowdy')

From here, you have a geojson object that you can convert to a Featureclass via JSONToFeatureClass, import into other analytical tools like geopandas/ pandas dataframes, or use in API's like the ArcGIS API for Python. Having the data in this portable format also assists with extracting data out of the KML. For example, if you were tasked with getting a list of all places within the KML, you could use a mapping program like Google Earth, QGIS or ArcGIS Pro to open the dataset and copy out each requested attribute, or you could employ Python and a few other packages to do the work for you now that you know how to parse the KML. For example, getting the place name of each feature by using the geopandas package:

import kml2geojson
import os
import geopandas as gpd

# read in the kml and convert to geojson.
kml_json = kml2geojson.main.convert(fr'{outDir}\CurtGowdyArchery.kml', r'curt_gowdy_archery')

# ETL to geopandas for manipulation/ data extraction
gdf = gpd.GeoDataFrame.from_features(kml_json[0]["features"])

place_names = list(set(gdf['name']))

print(place_names)

The features from the kml2geojson are read into the geopandas dataframe, where you can then use pandas functions to access the values. You could drill down into the features of the kml_json to loop over the list of features, though geopandas does it for us in less code.

Writing the geojson to file can be done using json.dump() method by adding it into our script:

...
# write the converted items to a file, using the name property of the kml_json object
# as the output filename.
with open(fr'{outDir}\{kml_json[0]["name"]}.geojson', 'w') as file:
    json.dump(kml_json, file)

...

ESRI Tools

Esri provides several geoprocessing tools that assist in the ETL of KML’s. KML To Layer converts a .kml or .kmz file into Featureclasses and a symbology layer file. This method creates a file geodatabase and parses the KML features into the respective point, polyline, and polygon featureclasses as part of a Feature Dataset. It uses a layer file to maintain the symbology that is included within the KMZ. The example below takes it a step further and splits the resulting featurclasses into individual featureclasses based on the attributes.

import os

outDir = r'C:\NGA\kml kmz data'
file_name = 'CurtGowdyStatePark'
kmz = os.path.join(outDir, f'{file_name}.kmz')

# extract the kml file
with ZipFile(kmz) as kmzFile:
    # extract files
    kmzFile.extractall(outDir)

# convert the kml/kmz to featureclasses in a gdb.
arcpy.conversion.KMLToLayer(fr"{outDir}\doc.kml", fr"{outDir}\{file_name}", file_name)

# Change the workspace to the gdb created by KMLToLayer. The method creates a file geodatabase named .gdb
arcpy.env.workspace = fr'{outDir}\{file_name}\{file_name}.gdb'

# get the featureclasses created by the KMLToLayer method
fcs = [fc for fc in arcpy.ListFeatureClasses(feature_dataset='Placemarks')]

# Set the fields that will be used for splitting the geometry into named featureclasses
# Multiple fields can be used, ie ['Name', 'SymbolId']
splitDict = {'Points': 'SymbolID',
             'Polylines': 'Name',
             'Polygons': 'Name'}

# iterate over featureclasses and execute the split by attributes.
for fc in fcs:
    arcpy.SplitByAttributes_analysis(fc, arcpy.env.workspace, splitDict.get(fc))