There's so much to learn about ArcGIS Server and GIS server technology in general that it's impossible to cover it all in this course. Instead, we've chosen to focus on some of the issues most commonly faced by people setting up and running a GIS server. In Lesson 2, you learned how to set up a server and a web service, and you viewed that service on the web. In Lesson 3, you took that a step further and learned how to prepare data for editing over the web. You also made a fully-featured web application.
In Lesson 4, you will learn how to build rasterized tile caches to improve the speed of your map services. This is a practice used by major web mapping services such as Google Maps, Bing Maps, MapQuest, and the ArcGIS Online services that you have already used in this course.
Building and maintaining tile caches requires careful strategy and planning, far beyond just knowing how to push the buttons to make tiles. For this reason, map tiling can be a fun and intriguing subject to study.
At the successful completion of this lesson you should be able to:
By this point in the course, you may have observed that there's more than one way to take raw GIS data from your server and put it into a map in someone's web browser. Recall some of the map services you used in the previous two lessons:
Like the two other choices above, tiled maps also have their unique drawbacks. The biggest one is the time investment and server power needed to generate the cache, along with the disk space necessary to store it. Also, because a cache represents a snapshot of your data at one point in time, it requires maintenance. If your source data or your map symbology is edited, you have to update the corresponding tiles in order for people to see the changes.
In this lesson, you'll learn about designing a map with the goal of building a tile cache. You'll get a chance to make some tiles and use them on the web. Since the number of tiles in a cache can multiply with each scale level added and become unmanageably large, you'll also learn about strategies for building and updating very big caches.
A word about different tile types before we begin: There are two main types of tiles commonly used in web maps today. The kinds of tiles we've been talking about above can be thought of as rasterized tiles; in other words, they are images made up of grids of pixels. Rasterized tiles are easy for clients to draw because most apps and all web browsers know how to display an image like a JPG or a PNG; however, the server has to construct the image and, after that, you're stuck with the colors and symbols you chose.
To get around issues with rasterized tiles, another type of tiles called vector tiles have been increasing in popularity. Vector tiles are similar in concept to rasterized tiles in the sense that they are square packets of information structured in a pyramid motif and sent by the server; however, they contain vector coordinates instead of a picture of the data. This allows the styling to be easily changed. Vector tiles are displayed as client-side graphics, so the client software needs to understand what a vector tile is and how to deal with it. Older mapping software and APIs may not be able to consume vector tiles.
We will talk more about vector tiles in Lesson 5 when we work with Mapbox software, since Mapbox pioneered this format and based their company on it. Esri vector tile support [1] is growing, although it has lagged behind that of Mapbox.
Be aware that all the remaining content in Lesson 4 refers to rasterized tiles, and some of the design and performance considerations discussed may be very different when thinking about vector tiles.
Building rasterized cache tiles is CPU and memory-intensive. Your server is making thousands of repetitive map draws, sometimes with a very complex MXD in the background. You can build a cache a lot faster if you assign the tile creation to a powerful machine.
This short-term need for high computing power is a perfect use case for cloud computing. A lot of offices don't have a powerful machine to spare for building tiles (usually their beefiest machine is the server that's already hosting their live apps and web services). In this situation, a server administrator could launch a high memory and/or high CPU instance for just a few hours for the purpose of building tiles. The extra cost is often worth the time savings that it takes to build the cache. Once the tiles are created, the machine can be shut down or scaled down.
For this lesson only, you'll change your ArcGIS Server site to run on a memory-optimized instance [2]. This costs significantly more than the general purpose instance [3] type that you've been using, but it will allow you to work with a complex map document and build cache tiles much faster.
Now that you are running on an instance that costs (at the time of this writing) $1.064/hour as opposed to 40 cents/hour, it's more important than ever that you remember to stop your site when you are done working on your lesson materials for the day. Also, be sure to set your instance type back to m4.2xlarge after building all your tiles in Lesson 4.
You'll find during this lesson that a rasterized tiled map service takes a lot of planning. Let's look at a few of the considerations needed to get a map ready for publishing as a tiled service. You'll download and examine a predesigned map and publish it as a service in preparation for making some tiles yourself.
The first question to settle is whether or not to make a tile cache at all. If the map is going to put strain on your server or take a noticeable amount of time to draw (these two often go together), then you need to consider making a tile cache. Most vector basemaps that give geographic context to your web map contain a lot of layers and fall into this category. This is one reason that splitting up your layers into basemap services and business layer services is a good idea; you can potentially cache the basemap while leaving the business layers uncached.
Is it necessary to cache the business layers, since that kind of data changes more frequently? Google used to do it with the Wikipedia layer in Google Maps [6]. With so many features (Wikipedia articles) to show, and with the amount of traffic Google Maps receives, it was burdensome on the servers to draw those points on the fly. (Sadly, the Wikipedia layer is no longer offered.)
In addition to high traffic scenarios, you can also consider caching business layers when the map covers a relatively small extent, the data doesn't change very often, or the data is displayed at small scales only. Layers like weather radar need to be updated frequently, but are rarely viewed at large scales and require relatively few tiles in the cache, thus the update can be performed in a reasonable amount of time.
There are a lot of decisions you need to make about how to set up your tile cache, but the first choice is the set of scales at which you are going to generate tiles. These scales represent the snapshots at which web users will see your map. They also determine how long it's going to take to create the cache, and which other web services the cache will be able to overlay. Ideally, you'll decide on your set of cache scales before you start designing your map.
Keep these things in mind when choosing a set of scales:
Creating detailed vector basemaps of the type that are typically cached presents a grand cartographic challenge. In contrast to paper cartography, in which the map has to be designed at just one scale, the web basemap has to be designed to look good at every scale in your tiling scheme.
Designing this type of multilevel basemap can require you to include varying symbols at different levels of your map. For example, a road might be represented with a 3-point line width at a large scale, a 1-point width at a medium scale, and may not be visible at all at a small scale. Since ArcMap does not allow scale-dependent symbols, you'll sometimes need to add multiple copies of the same layer into your map, set different scale ranges on them, then assign appropriate symbols for each scale range.
It's also important to choose muted colors for the base map that look good, but do not overwhelm other layers placed on top. Go to Google Maps: Designing the Modern Atlas [7] to see some examples of how the Google Map design has toned itself down over time to be more accommodating to overlays.The Esri Light Gray Canvas basemap is another study of designing a basemap specifically as a backdrop for more important thematic or operational layers.
When web mapping exploded during the past two decades, some cartographers expressed their chagrin at the simple, uniform maps churned out by websites. Some may have thought their very jobs and livelihood were threatened. However, the years have shown that cartography holds a critical place in web mapping. Projects like the OpenStreetMap terrain layer [8] and the Esri World Topographic Map [9] incorporate very advanced cartographic techniques. In a sense, map tiling gave cartographers a ticket to ride in the web world, since these detailed maps would be too slow to serve dynamically.
No wonder some GIS professionals shrink at the thought of trying to design such a map on their own. Some organizations that lack an in-house cartographer have just limped along with the same symbols they used when more primitive map server technology was available. Others have imitated the colors and symbols of the ubiquitous Google Maps in their own basemaps (perhaps in response to a manager's demand, "Make our maps look like that!").
In response to queries about how the ArcGIS Online basemaps were constructed, Esri has released sample ArcMap documents using all the ArcGIS Online base map symbols. People can insert their own data into the map or simply copy the symbol settings into their own maps. Examining one of these maps provides a good lesson in multilayer basemap design.
In this part of the lesson, you'll download and examine a map template that Esri has provided for the ArcGIS Online street map. This sample map covers the Little Rock, Arkansas region. You'll then publish the map as a service and get it ready for creating tiles in the next section of the lesson.
Now that you've finished designing your map, you're ready to start creating the cache of map tiles. As an advance notice, you should plan at least one continuous hour to work on this page of the lesson.
In this lesson, you'll learn how to create tiles using ArcGIS Server. However, tiles can be created using many other types of GIS and mapping utilities. Mapnik [11] is an example, which is used to create the tiles for OpenStreetMap.
Map tiling has become so popular that the Open Geospatial Consortium (OGC) has even released the Web Map Tiling Standard (WMTS) detailing an open specification on how mapping web services should expose their tile sets. ArcGIS Server services that have a tile cache can respond to WMTS-formatted requests.
When you publish a map service or image service in ArcGIS Server, you can define whether it will have a cache and what the cache properties will be. You can either build the tiles right at the time the service is published, or you can instigate the tile building later using geoprocessing tools like Manage Map Server Cache Tiles. Building the tiles at publish time is appropriate for smaller cache jobs, and that's what we'll do in this lesson.
The tile cache you just built was pretty straightforward. You just gave the tool a map with symbology defined for each scale level, it created tiles, and within a few minutes, you had your cache. In this case, you were fortunate that you just needed a cache of Little Rock, Arkansas. But what if you needed a cache of the entire United States, or world, down to a large scale like 1:4,500? This could take days or weeks to build, and could require terabytes of disk space. Even if you were successful at building such a cache, would you be able to do it again if the source data were updated?
This section of the lesson discusses strategic approaches for building large caches. These are presented in the order that they should be considered, meaning that if you skip down and implement one of the later strategies first, you still may end up doing things inefficiently.
If you need a tile cache that covers an enormous area at large scales, it would be worth your while to consider using one that someone else has built. Why go to the trouble if someone else has done it already? You've seen these types of worldwide tiled map services already throughout this course. They include ArcGIS Online, Bing Maps, and Google Maps. The companies who have built these caches have spent many thousands of dollars and hours collecting the data (often competing against each other for the best quality), building the tiles, and purchasing the hardware to serve them out in a rapid way. If you can get away with using them, you may save much time and resources.
The disadvantage of using someone else's tiles is that you cannot guarantee the accuracy or currency of the data. You don't get to choose the symbology or projection of the data either. Usually, you have to work in the Mercator projection.
Finally, if the tiled service goes offline for some reason or you lose your connection, you may have no control over when it will reappear. No server, whether it's maintained by Microsoft, Google, or Esri, can guarantee 100% uptime; however, this applies to your own servers as well. It's likely that these third-party services have better hardware infrastructure than your own when it comes to serving tiles; however, those tiles must still cross the Internet to get to your app, and that opens the door to potential connectivity problems.
Some organizations, especially those in the military and intelligence communities, have much of their network blocked from Internet access. Recognizing this, some tiled map service providers sell an appliance, basically a big server containing all the map tiles that can be plugged into your network. This eliminates the Internet access requirements, but still requires you to load periodic updates to the appliance. The Esri Data Appliance for ArcGIS [12] is an example of this type of appliance.
Some areas of a web map generate a lot more attention than other areas. Someone looking for directions to a particular house may zoom in down to the largest available scale in an urban area. However, in the middle of the desert where there are few geographic features to see, it's unlikely that someone would ever zoom to a very large scale such as 1:1100 (the largest scale offered by ArcGIS Online/Bing Maps/Google Maps).
Creating tiles at small scales isn't a problem since it takes relatively few tiles to cover the map, but if you are limited on time or disk space, it pays to be selective about which tiles you cache at the largest scales.
Some GIS professionals have a hard time accepting the fact that they don't need to create every tile at every scale. They feel that all places are created equal, and shudder at the idea that someone might zoom to an area of their map and see a "Data not available" image. In fact, such an experience is now commonplace among laypeople who use web maps, who tend to blame themselves when they see a "Data not available" tile ("Oh, I zoomed in too far") as opposed to blaming the server administrator ("Why isn't there a map here!?")
A useful website for countering the idea that "all places are created equal" was Microsoft Hotmap, an old project by Microsoft Researchers to visualize tile usage in Virtual Earth (now Bing Maps). This site is no longer functioning, but a screenshot below will give you an idea of its appearance. You could open Hotmap and zoom into your town, then use the Select Data Level dropdown to visualize tile usage at different levels. At the zoomed out data levels, most of the tiles are requested fairly often. But when you get down to the zoomed in data levels (17 - 19), some clear patterns begin to emerge regarding where people want to see tiles: urban areas, major roads, coastlines, and other areas of interest. There are also some places where people never or rarely view tiles: wilderness areas, bodies of water, and so on. These are the tiles you don't want to spend your resources creating and storing (for more images and analysis see Fisher D 2007 Hotmap: Looking at geographic attention. IEEE Transactions on Visualization and Computer Graphics 13: 1184-91 [13] and Fisher D 2009 The Impact of Hotmap. WWW Document [14].
A few years ago, one of the authors of this course undertook a project to selectively cache the state of California using the observed usage patterns in Hotmap. He and his colleague combined urban areas, roads, coastlines, and places of interest into a single vector dataset that covered about 25% of the land area of California, but included about 97% of its population. The use of this dataset to define tile creation, as opposed to the entire state boundary, saved nearly 1 million tiles when caching down to the 1:4500 scale (see Quinn S and Gahegan M 2010 A predictive model for frequently viewed tiles in a web map. Transactions in GIS 14: 193-216).
When using ArcGIS Server to create tiles, there are a couple of settings on the Manage Map Server Cache Tiles tool that allow you to be strategic about which tiles you create. These are the ability to check on and off the scales you want to create, and the ability to pass in a feature class boundary that will define the area of tile creation. For a large caching job, you'll probably run the tool at least twice. The first time, you'll have only the small scales checked, and you won't pass in a feature class, you'll just create all the tiles. The second time, you'll have only the large scales checked, and you will pass in a feature class constraining the area where you want to create tiles, just like you did in the previous section of the lesson where you passed in the urban Little Rock feature class.
The faster a map draws dynamically, the faster it will create cache tiles. All GIS software has its potential tweaks that can be made to increase performance, and ArcGIS is no exception. You've already learned, for example, that you can analyze your map using the Analyze button and see a list of potential performance issues.
Anything you can do to reduce computation will help your map draw faster. Matching the coordinate system of your source data, your data frame, and your web map will eliminate any costly projection on the fly. Saving out your labels to annotation (a way of storing labels in a database) will relieve the server from having to make label placement decisions while it is drawing your map. Spatial indexes [15] can help your map more quickly find the features that it needs to draw for each requested tile.
The more computing power you can put behind creating tiles, the faster you can build your cache. CPU and memory restraints are often more of a problem than having enough disk space to store the tiles.
There are two ways you can increase your server computing power, scaling up or scaling out. Scaling up means you replace your existing machine with something more powerful, like we did in this lesson. Scaling out means that you add more servers to your architecture, with these servers possibly all having the same size and spec.
The concept of having more than one server working on one job is called distributed computing. Although distributed computing can allow you to do great things, it comes with some unique challenges. All machines have to be able to see the data and access it, which may require some adjustment of paths used in your maps. For example, in a distributed setup, you want to use network paths like \\server\data, instead of local paths like c:\data. Cloud Formation sets up your site so that if you put your data in C:\data on the site server instance (one named SITEHOST, for example), you can reference it through the path \\SITEHOST\data from any machine in your site.
Distributed computing may also require some adjustment of security settings so that the tile creation software has permissions to access the data from any machine. In ArcGIS Server, this is accomplished by giving the ArcGIS Server account permissions to your data folder (Cloud Builder does this for you), and registering the data folder with ArcGIS Server (you did this earlier in the course).
Cloud computing can be an attractive environment for building caches, because you can access a higher level of computing power than you might typically have in your office. Usually, you only need it for a short period (a few hours or days to create all the tiles), so the prospect of renting a server by the hour becomes very attractive.
One challenge with building tiles in the cloud is moving them around. First, you have to get your data onto the cloud so that your caching software can quickly get it as the tiles are being drawn. Then you have to move the tiles back to their final home, which may be on premises. Both of these transactions involve moving data across the Internet and can be influenced by your organizations' bandwidth and security policies.
When creating tiles with ArcGIS Server on Amazon EC2, it's a lot easier to scale up than to scale out. As you have seen, Amazon offers the option to change the instance type (in other words, CPU, memory, etc.) without terminating the instance. This is very handy when you start doing something and realize you need a bigger machine, although you are required to stop the instance before you change size. Some of the largest instance types on Amazon EC2 have an enormous degree of CPU power and may negate the need to scale out. Scaling out ArcGIS Server on Amazon EC2 is accomplished by adding more GIS server machines to your site.
Think back over the above strategies and consider why the techniques at the beginning should be employed before those at the end. It can be exciting to think about how many tiles you can build with distributed computing and all the computing horsepower that's available through the cloud. You may actually save the most time and resources by carefully planning which scales you want to create and selectively generating tiles at the largest scales. If the cache is still going to be overwhelmingly large, consider using an existing cache or a data appliance. By using a combination of the above strategies, you can usually find a way to build the cache you need, whatever the size.
A tile cache is just a picture of your data at one point in time. If that data ever changes, you need to update the cache. This final section of the lesson gives some practical considerations for updating and maintaining a cache over time.
Your update strategy probably should have come into consideration before you even decided you were going to create a cache. If you need to see data in real time, or you have frequent changes occurring over broad extents of the map, then creating a tile cache may not be appropriate.
For each map, there's a threshold of acceptable data currency. For a neighborhood street map available in your handheld GPS, you may find it acceptable if the street data is updated once every three months. For a tax assessor looking at land parcels, it may be acceptable to have the data current to within the past day or two. For a 911 operator tracking a vehicle's progress, a delay of more than a few seconds may not be acceptable.
If the cache update can be performed within the threshold of acceptable data currency, then it may make sense to create a cache. If the cache cannot be updated that quickly, then caching should not be used.
There are two approaches for cache updates; generate the entire cache, or focus the updates on places where the data has changed. If your entire cache can be rebuilt within the threshold of acceptable data currency, then it may be easier to do the first option, you can just kick off a rebuild of all the tiles and be done.
If your cache is very large and it is undesirable to rebuild the entire thing, then you need some way to track places that have been edited (for the sake of this discussion, we'll call these "dirty areas"). You can then pass the dirty area polygons into your caching tools to define where tile updates should occur.
So how do you find the dirty areas? One approach is to track them as edits are being made, each transaction can be logged to a database and, at the end of the edit session, the spatial extents of all the transactions can be exported to create a vector dataset of dirty areas.
If real-time tracking of the dataset editing is not an option, you can attempt to compare two datasets directly for attributes or spatial features that do not match. This type of strategy is required when you receive a dataset update without any record of how it was created (such as from a data vendor every six months). It requires that features have at least one key field in common between the two datasets. Comparing attributes is necessary if map symbolization or labeling could change based on a field value.
Accomplishing either of the above solutions in ArcGIS requires custom programming. Fortunately, this problem is common enough that people have posted some scripts and tools online that help address it. The Show Edits Since Reconcile [16] tool, written by Tom Brenneman, compares two versions of an ArcSDE geodatabase and outputs a feature class of spatial discrepancies. It can be installed into your list of toolboxes in ArcGIS. A similar tool Compare two feature classes in a file geodatabase [17], written by Sterling Quinn, is designed for those who do not have their data in ArcSDE.
Basing an ArcGIS tile cache update on dirty areas requires some degree of caution. A feature class full of small, adjacent polygons can cause the Manage Map Server Cache Tiles tool to work slowly and inefficiently. If there are a lot of small dirty areas in close proximity, they should be merged before the dirty areas feature class is used to define a caching job.
It's common to perform tile cache updates on a regular basis, such as every three months, every week, or every evening. Because caching is so resource-intensive, many server administrators like to build the updated tiles on a staging server and then copy them to their production server. This avoids disruption to those who are viewing tiles on the live website.
Whether you use a staging server or not, it's wise to perform the update during times when the fewest possible individuals will be using your site. For most sites, this is during the early morning hours or the weekend. Since you probably do not want to log in at 2 AM Sunday morning to run your caching tools, it's worth exploring whether your tile caching software can be automated and scheduled to run at given times.
The ArcGIS tools, for example, can be automated using a Python script. Python is a relatively simple programming language to learn, and it can be used to run any ArcGIS tool, including Manage Map Server Cache Tiles. For a full update process, you might decide to chain several tools and functions together in one script, such as:
Once you have a script that does everything you need, you can use your operating system to schedule it to run on a regular basis. Task Scheduler, included with Windows, is an example of a program that can run scripts on a repeated basis at any time you specify (such as nights or weekends).
Python scripting with ArcGIS is taught in Penn State's Geog 485: GIS Programming and Software Development [18]. If you're curious to see an example of a Python script that updates a cache, check out the ArcGIS help topic Automating cache creation and updates with geoprocessing [19].
In this assignment, you will put together all of the ArcGIS Server skills that you learned in Lessons 2 - 4. Starting with a folder of raw GIS datasets, you will compose maps, publish them as web services, and assemble those services into a web application. You will create a video tour of your web application so that you don't have to leave your server running as the project is graded.
The data for this assignment consists of vector feature classes covering an area around a town. I downloaded these from the State of California Geoportal [20] (formerly the California Spatial Information Library - CaSIL) and did some post-processing on them so that they cover the same extent. Don't worry too much about what town this really is; for this assignment, consider that it could be Anytown, USA.
Download the data for this assignment [21]
Pretend you work for a town that up until now has only done GIS in the desktop realm (maybe there is no pretending needed). You are moving to ArcGIS Server for the first time. You want to take your GIS data and make it available in a series of highly-focused web applications.
Your first application will focus on your urban flooding dataset. This is a point feature class that shows areas in the city that tend to pool with water and flood during a storm event. Your web app will allow "non-GIS-trained" personnel in other city departments to add and remove points from this layer.
You've been asked to create a basemap web service that will be used as a backdrop in this web application and other apps your town will create in the future. You must design this basemap yourself and create a tile cache for it. An existing basemap from ArcGIS Online, Bing Maps, or Google Maps cannot be used because the map needs to show your town's own data. However, you can imitate design principles and techniques used in those maps.
You are also to create a separate web service containing only the urban flooding layer. This layer should be exposed as a feature service and should be editable. This involves loading the source data into SQL Server Express as shown in Lesson 3.
Once you have created these two web services, you must overlay them in a web application that allows the urban flooding service to be edited by the application user. Do this using the ArcGIS Web AppBuilder unless you already have extensive coding experience with another API such as the ArcGIS API for JavaScript.
Because this assignment takes a fair amount of time, there is no cloud computing discussion assignment this week.
To minimize the amount of time your cloud-based server is left running, this project will be graded based on a short video tour of your app. You should record this using Zoom, Screencastomatic, or a comparable screen recording utility of your choice. Your video must demonstrate the following features in your ArcGIS Services Directory and your flooding application. Each item is worth 3 points, resulting in a total of 30 points available for this project (making it three times the value of a typical weekly assignment):
I recommend you use your video recording software to export an .MP4 file or some other easily shareable format. You can either host the file on YouTube, your PSU Microsoft OneDrive space, or some other online repository and provide a link (make sure it is viewable to the faculty). Zoom [22] is a tool available to PSU faculty, staff, and students that will easily allow you to screen share and easily record your screen. Zoom recordings will save as an .MP4 file. If you don't want to put the video online or can't get that to work, you can upload it to Canvas. Contact your instructor if these options don't work.
Do not host the video on your EC2 instance. Your instance should be stopped when you are not working on this course.
Links
[1] http://pro.arcgis.com/en/pro-app/help/mapping/map-authoring/author-a-map-for-vector-tile-creation.htm
[2] https://aws.amazon.com/ec2/instance-types/#Memory_Optimized
[3] https://aws.amazon.com/ec2/instance-types/#General_Purpose
[4] http://aws.amazon.com/ec2/instance-types/
[5] https://aws.amazon.com/ec2/pricing/on-demand/
[6] http://maps.google.com
[7] http://www.core77.com/blog/case_study/google_maps_designing_the_modern_atlas_21486.asp
[8] http://mike.teczno.com/notes/osm-us-terrain-layer.html
[9] http://www.arcgis.com/home/item.html?id=6e850093c837475e8c23d905ac43b7d0
[10] http://www.arcgis.com/home/webmap/viewer.html
[11] http://mapnik.org/
[12] https://doc.arcgis.com/en/data-appliance/
[13] https://www.microsoft.com/en-us/research/publication/hotmap-looking-at-geographic-attention/
[14] http://research.microsoft.com/apps/pubs/default.aspx?id=81244
[15] http://desktop.arcgis.com/en/arcmap/latest/manage-data/geodatabases/an-overview-of-spatial-indexes-in-the-geodatabase.htm
[16] http://www.arcgis.com/home/item.html?id=b75fc9edf166438c82d66f4982e4e031
[17] http://esriurl.com/compare
[18] http://www.e-education.psu.edu/geog485/
[19] http://server.arcgis.com/en/server/latest/publish-services/windows/automating-cache-creation-and-updates-with-geoprocessing.htm
[20] https://gis.data.ca.gov/
[21] https://www.e-education.psu.edu/geog865/sites/www.e-education.psu.edu.geog865/files/Town.zip
[22] https://psu.zoom.us/