In Chapter 3, we studied the population data produced by the U.S. Census Bureau, and some of the ways those data can be visualized with thematic maps.
In addition to producing data about the U.S. population and economy, the Census Bureau is a leading producer of digital map data. The Census Bureau's Geography Division created its "Topologically Integrated Geographic Encoding and Referencing" (TIGER) spatial database with help from the U.S. Geological Survey. In preparation for the 2010 census, the Bureau conducted a database redesign project that combined TIGER with a Master Address File (MAF) database. MAF/TIGER enables the Bureau to associate census data, which it collects by household address, with the right census areas and voting districts. This is an example of a process called address-matching or geocoding.
The MAF/TIGER database embodies the vector approach to spatial representation. It uses point, line, and polygon features to represent streets, water bodies, railroads, administrative boundaries, and select landmarks. In addition to the "absolute" locations of these features, which are encoded with latitude and longitude coordinates, MAF/TIGER encodes their "relative" locations--a property called topology.
MAF/TIGER also includes attributes of these vector features including names, administrative codes, and, for many streets, address ranges and ZIP Codes. Vector feature sets are extracted from the MAF/TIGER database to produce reference maps for census takers and thematic maps for census data users. Such extracts are called TIGER/Line Shapefiles.
Characteristics of TIGER/Line Shapefiles that make them useful to the Census Bureau also make them valuable to other government agencies and businesses. Because they are not protected by copyright, TIGER/Line data have been widely adapted for many commercial uses. TIGER has been described as "the first truly useful nationwide general-purpose spatial data set" (Cooke 1997, p. 47). Some say that it jump-started a now-thriving geospatial data industry in the U.S.
The objective of this chapter is to familiarize you with MAF/TIGER and two important concepts it exemplifies: topology and geocoding. Specifically, students who successfully complete Chapter 4 should be able to:
Take a minute to complete any of the Try This activities that you encounter throughout the chapter. These are fun, thought provoking exercises to help you better understand the ideas presented in the chapter.
You may be interested in seeing the concept map used to guide development of Chapters 3 and 4. The concept map delineates the entities and relationships that make up the contents of the two chapters.
MAF/TIGER is the Census Bureau's geographic database system. Several factors prompted the U.S. Census Bureau to create MAF/TIGER: the need to conduct the census by mail, the need to produce wayfinding aids for census field workers, and its mission to produce map and data products for census data users.
As the population of the U.S. increased, it became impractical to have census takers visit every household in person. Since 1970, the Census Bureau has mailed questionnaires to most households, with instructions that completed forms should be returned by mail. Most, but certainly not all, of these questionnaires, are dutifully mailed—about 72 percent of all questionnaires in 2010. At that rate, the Census Bureau estimates that some $1.6 billion was saved by reducing the need for field workers to visit non-responding households.
To manage its mail delivery and return operations, the Census Bureau relies upon a Master Address File (MAF). MAF is a complete inventory of housing units and many business locations in the U.S., Puerto Rico, and associated island areas. MAF was originally built from the U.S. Postal Service’s Delivery Sequence File of all residential addresses. The MAF is updated through both corrections from field operations and a Local Update of Census Address (LUCA) program by which tribal, state, and local government liaisons review and suggest updates to local address records. “MAF/TIGER” refers to the coupling of the Master Address File with the TIGER spatial database, which together enables the Census Bureau to efficiently associate address-referenced census and survey data received by mail with geographic locations on the ground and tabulation areas of concern to Congress and many governmental agencies and businesses.
It’s not as simple as it sounds. Postal addresses do not specify geographic locations precisely enough to fulfill the Census Bureau’s constitutional mandate. An address is not a position in a grid coordinate system--it is only one in a series of ill-defined positions along a route. The location of an address is often ambiguous because street names are not unique, numbering schemes are inconsistent, and because routes have two sides, left and right. Location matters, as you recall, because census data must be accurately georeferenced to be useful for reapportionment, redistricting, and allocation of federal funds. Thus, the Census Bureau had to find a way to assign address referenced data automatically to particular census blocks, block groups, tracts, voting districts, and so on. That's what the "Geographic Encoding and Referencing" in the TIGER acronym refers to.
A second motivation that led to MAF/TIGER was the need to help census takers find their way around. Millions of households fail to return questionnaires by mail, after all. Census takers (called “enumerators” at the Bureau) visit non-responding households in person. Census enumerators need maps showing streets and select landmarks to help locate households. Census supervisors need maps to assign census takers to particular territories. Field notes collected by field workers are an important source of updates and corrections to the MAF/TIGER database.
Prior to 1990, the Bureau relied on local sources for its maps. For example, 137 maps of different scales, quality, and age were used to cover the 30-square-mile St. Louis area during the 1960 census. The need for maps of consistent scale and quality forced the Bureau to become a map maker as well as a map user. Using the MAF/TIGER system, Census Bureau geographers created over 17 million maps for a variety of purposes in preparation for the 2010 Census.
The Census Bureau's mission is not only to collect data, but also to make data products available to its constituents. In addition to the attribute data considered in Chapter 3, the Bureau disseminates a variety of geographic data products, including wall maps, atlases, and one of the earliest online mapping services, the TIGER Mapping Service. You can explore the Bureau's maps and cartographic data products here.
The Census Bureau conducted a major redesign of the MAF/TIGER database in the years leading up to the 2010 decennial census. What were separate, homegrown database systems (MAF and TIGER) are now unified in the industry-standard Oracle relational database management system. Benefits of this “commercial off-the-shelf” (COTS) database software include concurrent multi-user access, greater user familiarity, and better integration with web development tools. As Galdi (2005) explains in his white paper, “Spatial Data Storage and Topology in the Redesigned MAF/TIGER System,” the redesign “mirrors a common trend in the Information Technology (IT) and Geographic Information System (GIS) industries: the integration of spatial and non-spatial data into a single enterprise data set” (p. 2).
Concurrent with the MAF/TIGER redesign, the Census Bureau also updated the distribution format of its TIGER/Line map data extracts. Consistent with the Bureau’s COTS strategy, it adopted the de facto standard Esri “Shapefile” format. The following pages consider characteristics of the spatial data stored in MAF/TIGER and in TIGER/Line Shapefile extracts.
The Census Bureau began to develop a digital geographic database of 144 metropolitan areas in the 1960s. By 1990, the early efforts had evolved into TIGER: a seamless digital geographic database that covered the whole of the United States and its territories. As discussed in the previous page, MAF/TIGER succeeded TIGER in the lead-up to the 2010 Census.
TIGER/Line Shapefiles are digital map data products extracted from the MAF/TIGER database. They are freely available from the Census Bureau and are suitable for use by individuals, businesses, and other agencies that don’t have direct access to MAF/TIGER.
This section outlines the geographic entities represented in the MAF/TIGER database, describes how a particular implementation of the vector data model is used to represent those entities, and considers the accuracy of digital features in relation to their counterparts on the ground. The following page considers characteristics of the “Shapefile” data format used to distribute digital extracts from MAF/TIGER.
The MAF/TIGER database is selective. Only those geographic entities needed to fulfill the Census Bureau’s operational mission are included. Entities that don't help the Census Bureau conduct its operations by mail, or help field workers navigate a neighborhood, are omitted. Terrain elevation data, for instance, are not included in MAF/TIGER. A comprehensive list of the "feature classes” and “superclasses” included in MAF/TIGER and Shapefiles can be found via the MAF/TIGER Feature Class Codes (MTFCCs) link on the list of Geographic Codes on the Census.gov > Geography > Reference page. Examples of superclasses include:
MTFCC | FEATURE CLASS | SUPERCLASS | POINT | LINEAR | AREAL | FEATURE CLASS DESCRIPTION |
---|---|---|---|---|---|---|
S1400 | Local Neighborhood Road, Rural Road, City Street | Road/Path Features | N | Y | N | Generally a paved non-arterial street, road, or byway that usually has a single lane of traffic in each direction. Roads in this feature class may be privately or publicly maintained. Scenic park roads would be included in this feature class, as would (depending on the region of the country) some unpaved roads. |
S1500 | Vehicular Trail (4WD) | Road/Path Features | N | Y | N | An unpaved dirt trail where a four-wheel drive vehicle is required. These vehicular trails are found almost exclusively in very rural areas. Minor, unpaved roads usable by ordinary cars and trucks belong in the S1400 category. |
S1630 | Ramp | Road/Path Features | N | Y | N | A road that allows controlled access from adjacent roads onto a limited access highway, often in the form of a cloverleaf interchange. These roads are unaddressable. |
Note also that neither the MAF/TIGER database nor TIGER/Line Shapefiles include the population data collected through questionnaires and by census takers. MAF/TIGER merely provides the geographic framework within which address-referenced census data are tabulated.
In this Try This! (One of 3 dealing with TIGER/Line Shapefiles), you are going to explore which TIGER/Line Shapefiles are available for download at various geographies and what information those files contain. We will be exploring the 2009 and 2010 versions of the TIGER/Line Shapefile data sets. Versions from other years are available. Feel free to investigate those, too.
As stated above, we want you to get a sense of the sorts of data that are available for the various geographies -- from the county to the national level. Perusing the various layers as I had you doing above makes it difficult to make an overall assessment of what data there are at a given geographic scale. Fortunately for our purposes, the Census has provided a convenient table to help us in this regard.
From the TIGER/Line Shapefiles page,
What files are available for a state that are not available for the whole nation? Can you think of reasons why these are not available as a single national file?
What files available at the state level are also available at the county-level? Once again, share your thoughts with your peers.
Like other implementations of the vector data model, MAF/TIGER represents geographic entities using geometric primitives including nodes (point features), edges (linear features), and faces (area features). These are defined and illustrated below.
Until recently, the geometric accuracy of the vector features encoded in TIGER was notoriously poor (see figure below). How poor? Through 2003, the TIGER/Line metadata stated that
Coordinates in the TIGER/Line files have six implied decimal places, but the positional accuracy of these coordinates is not as great as the six decimal places suggest. The positional accuracy varies with the source materials used, but generally, the information is no better than the established National Map Accuracy standards for 1:100,000-scale maps from the U.S. Geological Survey (Census Bureau 2003).
Having performed scale calculations in Chapter 2, you should be able to calculate the magnitude of error (ground distance) associated with 1:100,000-scale topographic maps. Recall that the allowed error for USGS topographic maps at scales of 1:20,000 or smaller is 1/50 inch (see the nationalmap standards pdf).
Starting in 2002, in preparation for the 2010 census, the Census Bureau commissioned a six-year, $200 million MAF/TIGER Accuracy Improvement Project (MTAIP). One objective of the effort was to use GPS to capture accurate geographic coordinates for every household in the MAF. Another objective was to improve the accuracy of TIGER's road/path features. The project aimed to adjust the geometry of street networks to align within 7.6 meters of street intersections observed in orthoimages or measured using GPS. The corrected streets are necessary not just for mapping, but for accurate geocoding. Because streets often form the boundaries of census areas, it is essential that accurate household locations are associated with accurate street networks.
MTAIP integrated over 2,000 source files submitted by state, tribal, county, and local governments. Contractors used survey-grade GPS to evaluate the accuracy of a random sample of street centerline intersections of the integrated source files. The evaluation confirmed that most but not all features in the spatial database equal or exceed the 7.6-meter target. Uniform accuracy wasn’t possible due to the diversity of local source materials used, though this accuracy is the standard in the "All Lines" Shapefile extracts. The geometric accuracy of particular feature classes included in particular shapefiles is documented in the metadata associated with that shapefile extract.
MTAIP was completed in 2008. In conjunction with the continuous American Community Survey and other census operations, corrections and updates are now ongoing. TIGER/Line Shapefile updates are now released annually.
Since 2007, TIGER/Line extracts from the MAF/TIGER database have been distributed in shapefile format. Esri introduced shapefiles in the early 1990s as the native digital vector data format of its ArcView software product. The shapefile format is proprietary but open; its technical specifications are published and can be implemented and used freely. Largely as a result of ArcView’s popularity, shapefile has become a de facto standard for creation and interchange of vector geospatial data. The Census Bureau’s adoption of Shapefile as a distribution format is therefore consistent with its overall strategy of conformance with mainstream information technology practices.
The first thing GIS pros need to know about shapefiles is that every shapefile data set includes a minimum of three files. One of the three required files stores the geometry of the digital features as sets of vector coordinates. A second required file holds an index that, much like the index in a book, allows quick access to the spatial features and therefore speeds processing of a given operation involving a subset of features. The third required file stores attribute data in dBASE© format, one of the earliest and most widely-used digital database management system formats. All of the files that make up a Shapefile data set have the same root or prefix name, followed by a three-letter suffix or file extension. The list below shows the names of the three required files making up a shapefile data set named “counties.” Take note of the file extensions:
Esri lists twelve additional optional files, and practitioners are able to include still others. Two of the most important optional files are the “.prj” file, which includes the coordinate system definition, and “.xml”, which stores metadata. (Why do you suppose that something as essential as a coordinate system definition is considered “optional”?)
In this Try This! (the second of 3 dealing with TIGER/Line Shapefiles), you will download a TIGER/Line Shapefile dataset, investigate the file structure of a typical Esri shapefile, and view it in GIS software.
You can use a free software application called Global Mapper (originally known as dlgv32 Pro) to investigate TIGER/Line shapefiles. Originally developed by the staff of the USGS Mapping Division at Rolla, Missouri as a data viewer for USGS data, Global Mapper has since been commercialized but is available in a free trial version. The instructions below will guide you through the process of installing the software and opening the TIGER/Line data.
What do you think has to be understood by the mapping application to allow it to automatically symbolize features differently?
A single shapefile data set can contain one of three types of spatial data primitives, or features – points, lines or polygons (areas). The technical specification defines these as follows:
At left in the figure above, a polygon Shapefile data set holds the Census blocks in which the edges from the MAF/TIGER database have been combined to form two distinct polygons, P1 and P2. The diagram shows the two polygons separated to emphasize the fact that what is the single E12 edge in the MAF/TIGER database (see the Figure 4.4.1 on page 4) is now present in each of the Census block polygon features.
In the middle of the illustration, above, a polyline Shapefile data set holds seven line features (L1-7) that correspond to the seven edges in the MAF/TIGER database. The directionality of the line features that represent streets corresponds to address range attributes in the associated dBASE© table. Vertices define the shape of a polygon or a line, and the Start and End Nodes from the MAF/TIGER database are now First and Last Vertices.
Finally, at right in the illustration above, a point Shapefile data set holds the three isolated nodes from the MAF/TIGER database.
Topology is different from topography. (You’d be surprised how often these terms get mixed up.) In Chapter 2, you read about the various ways that absolute positions of features can be specified in a coordinate system, and how those coordinates can be projected or otherwise transformed. Topology refers to the relative positions of spatial features. Topological relations among features — such as containment, connectivity, and adjacency—don’t change when a dataset is transformed. For example, if an isolated node (representing a household) is located inside a face (representing a congressional district) in the MAF/TIGER database, you can count on it remaining inside that face no matter how you might project, rubber-sheet, or otherwise transform the data. Topology is vitally important to the Census Bureau, whose constitutional mandate is to accurately associate population counts and characteristics with political districts and other geographic areas.
As David Galdi (2005) explains in his white paper “Spatial Data Storage and Topology in the Redesigned MAF/TIGER System,” the “TI” in TIGER stands for “Topologically Integrated.” This means that the various features represented in the MAF/TIGER database—such as streets, waterways, boundaries, and landmarks (but not elevation!)—are not encoded on separate “layers.” Instead, features are made up of a small set of geometric primitives—including 0-dimensional nodes and vertices, 1-dimensional edges, and 2-dimensional faces—without redundancy. That means that where a waterway coincides with a boundary, for instance, MAF/TIGER represents them both with one set of edges, nodes, and vertices. The attributes associated with the geometric primitives allow database operators to retrieve feature sets efficiently with simple spatial queries. The separate feature-specific TIGER/Line Shapefiles published at the county level (such as point landmarks, hydrography, Census block boundaries, and the "All Lines" file you are using in the multi-part "Try This") were extracted from the MAF/TIGER database in that way. Notice, however, that when you examine a hydrography shapefile and a boundary shapefile, you will see redundant line segments where the features coincide. That fact confirms that TIGER/Line Shapefiles, unlike the MAF/TIGER database itself, are not topologically integrated. Desktop computers are now powerful enough to calculate topology “on the fly” from shapefiles or other non-topological data sets. However, the large batch processes performed by the Census Bureau still benefit from the MAF/TIGER database’s persistent topology.
MAF/TIGER’s topological data structure also benefits the Census Bureau by allowing it to automate error-checking processes. By definition, features in the TIGER/Line files conform to a set of topological rules (Galdi 2005):
Compliance with these topological rules is an aspect of data quality called logical consistency. In addition, the boundaries of geographic areas that are related hierarchically—such as blocks, block groups, tracts, and counties—are represented with common, non-redundant edges. Features that do not conform to the topological rules can be identified automatically, and corrected by the Census geographers who edit the database. Given that the MAF/TIGER database covers the entire U.S. and its territories, and includes many millions of primitives, the ability to identify errors in the database efficiently is crucial.
So how does topology help the Census Bureau assure the accuracy of population data needed for reapportionment and redistricting? To do so, the Bureau must aggregate counts and characteristics to various geographic areas, including blocks, tracts, and voting districts. This involves a process called “address matching” or “address geocoding” in which data collected by household is assigned a topologically-correct geographic location. The following pages explain how that works.
Geocoding is the process used to convert location codes, such as street addresses or postal codes, into geographic (or other) coordinates. The terms “address geocoding” and “address mapping” refer to the same process. Geocoding address-referenced population data is one of the Census Bureau’s key responsibilities. However, as you know, it’s also a very popular capability of online mapping and routing services. In addition, geocoding is an essential element of a suite of techniques that are becoming known as “business intelligence.” We’ll look at applications like these later in this chapter, but first, let’s consider how the Census Bureau performs address geocoding.
Prior to the MAF/TIGER modernization project that led up to the decennial census of 2010, the TIGER database did not include a complete set of point locations for U.S. households. Lacking point locations, TIGER was designed to support address geocoding by approximation. As illustrated below in Figure 4.7.1, the pre-modernization TIGER database included address range attributes for the edges that represent streets. Address range attributes were also included in the TIGER/Line files extracted from TIGER. Coupled with the Start and End nodes bounding each edge, address ranges enable users to estimate locations of household addresses.
Here’s how it works. The diagram above highlights an edge that represents a one-block segment of Oak Avenue. The edge is bounded by two nodes, labeled "Start" and "End." A corresponding record in an attribute table includes the unique ID number (0007654320) that identifies the edge, along with starting and ending addresses for the left (FRADDL, TOADDL) and right (FRADDR, TOADDR) sides of Oak Avenue. Note also that the address ranges include potential addresses, not just existing ones. This is to make sure that the ranges will remain valid as new buildings are constructed along the street.
A common geocoding error occurs when Start and End designations are assigned to the wrong connecting nodes. You may have read in Galdi’s (2005) white paper “Spatial Data Storage and Topology in the Redesigned MAF/TIGER System,” that in MAF/TIGER, “an arbitrary direction is assigned to each edge, allowing designation of one of the nodes as the Start Node, and the other as the End Node” (p. 3). If an edge’s “direction” happens not to correspond with its associated address ranges, a household location may be placed on the wrong side of a street.
Although many local governments in the U.S. have developed their own GIS “land bases” with greater geometric accuracy than pre-modernization TIGER/Line files, similar address geocoding errors still occur. Kathryn Robertson, a GIS Technician with the City of Independence, Missouri pointed out how important it is that Start (or "From") nodes and End (or "To") nodes correspond with the low and high addresses in address ranges. "I learned this the hard way," she wrote, "geocoding all 5,768 segments for the city of Independence and getting some segments backward. When address matching was done, the locations were not correct. Therefore, I had to go back and look at the direction of my segments. I had a rule of thumb, all east-west streets were to start from west and go east; all north-south streets were to start from the south and go north" (personal communication).
Although this may have been a sensible strategy for the City of Independence, can you imagine a situation in which Kathryn’s rule-of-thumb might not work for another municipality?
If TIGER had included accurate coordinate locations for every household, and correspondingly accurate streets and administrative boundaries, geocoding census data would be simple and less error-prone. Many local governments digitize locations of individual housing units when they build GIS land bases for property tax assessment, E-911 dispatch, and other purposes. The MAF/TIGER modernization project begun in 2002 aimed to accomplish this for the entire nationwide TIGER database in time for the 2010 census. The illustration below in Figure 4.7.2 shows the intended result of the modernization project, including properly aligned streets, shorelines, and individual household locations, shown here in relation to an orthorectified aerial image.
The modernized MAF/TIGER database described by Galdi (2005) is now in use, including precise geographic locations of over 100 million household units. However, because household locations are considered confidential, users of TIGER/Line Shapefiles extracted from the MAF/TIGER database still must rely upon address geocoding using address ranges.
Launched in 1996, MapQuest was one of the earliest online mapping, geocoding and routing services. MapQuest combined the capabilities of two companies: a cartographic design firm with long experience in producing road atlases, “TripTiks” for the American Automobile Association, and other map products, and a start-up company that specialized in custom geocoding applications for business. Initially, MapQuest relied in part on TIGER/Line street data extracted from the pre-modernization TIGER database. MapQuest and other commercial firms were able to build their businesses on TIGER data because of the U.S. government’s wise decision not to restrict its reuse. It’s been said that this decision triggered the rapid growth of the U.S. geospatial industry.
Later on in this chapter, we’ll visit MapQuest and some of its more recent competitors. Next, however, you'll have a chance to see how geocoding is performed using a TIGER/Line data in a GIS.
Part 3 of 3 in the TIGER/Line Shapefile Try This! series is not interactive, but instead illustrates how the address ranges encoded in TIGER/Line Shapefiles can be used to pinpoint (more or less!) the geographic locations of street addresses in the U.S.
The process of geocoding a location within a GIS begins with a line dataset (shapefile) with the necessary address range attributes. The following image is an example of the attribute table of a TIGER/Line shapefile.
This shapefile contains over 29,000 road segments in total. Note the names of some of the attributes:
Next, the GIS software needs to know which of these attributes contains each piece of the necessary address range information. Some shapefiles use different names for their attributes, so the GIS can't always know which attribute contains the Right-Side-From-Address information, for example. In ArcGIS, for example, something called a Locator is configured that maps the attributes in the shapefile to the corresponding piece of necessary address information. The image below illustrates what this mapping looks like:
We are now ready to find a location by searching for a street address! Let's geocode the location for "1971 Fairwood Lane, 16803".
When an address is specified, the GIS queries the attribute table to find rows with a matching street name in the correct ZIP code. Also, the particular segment of the street that contains the address number is identified. Figure 4.8.3 shows the corresponding selection in the attribute table:
Figure 4.8.4 shows the corresponding road segment highlighted on a map. The To and From address values for the road segment have been added so you can see the range of addresses.
Finally, the GIS interpolates where along the road segment the value of 1971 occurs and places it on the appropriate side of the street based on the even/odd values indicated in the attribute table. Figure 4.8.5 shows the final result of the geocoding process:
The accuracy of a geocoded location is dependent on a number of factors, including the quality of the line work in a shapefile, the accuracy of the address range attributes of each road segment, and the interpolation performed by the software. As you may see in the following section, different geocoding services may provide different location results due to the particular data and procedures used.
No doubt you're familiar with one or more popular online mapping services. How well do they do at geocoding the location of a postal address? You can try it out for yourself at several web-based mapping services, including MapQuest.com, Microsoft's Bing Maps, and Tele Atlas/TomTom's Geocode.com (no longer a live site). Tele Atlas, for example, has been a leading manufacturer of digital street data for vehicle navigation systems. To accommodate the routing tasks that navigation systems are called upon to serve, the streets are encoded as vector features whose attributes include address ranges. (In order to submit an address for geocoding at Geocode.com, you have to set up a trial account through their EZ-Locate Interactive web tool or download the EZ-Locate software).
Shown above is the form by which you can geocode an address to a location in a Tele Atlas street database. The result is shown below in Figure 4.9.2.
Let's compare the geocoding capabilities of MapQuest.com to locate the address on an actual map.
The MapQuest.com map from 2013 estimates the address is close to its actual location. Below is a similar MapQuest product created back in 1998. On the older map, the same address is plotted on the opposite side of the street. What do you suppose is wrong with the address range attribute in that case?
On the map from 1998, also note the shapes of the streets. The street shapes in the 2011 map have been improved. The 1998 product seems to have been generated from the 1990 version of the TIGER/Line files, which may have been all that was available for this relatively remote part of the country. Now MapQuest licenses street data from a business partner called NAVTEQ.
The point of this section is to show that geocoding with address ranges involves a process of estimation. The Census Bureau's TIGER/Line Shapefiles, like the commercial street databases produced by Tele Atlas, Navigation Technologies, and other private firms, represent streets as vector line segments. The vector segments are associated with address range attributes, one for the left side of the street, one for the right side. The geocoding process takes a street address as input, finds the line segment that represents the specified street, checks the address ranges to determine the correct side of the street, then estimates a location at the appropriate point between the minimum and maximum address for that segment and assigns an estimated latitude/longitude coordinate to that location. For example, if the minimum address is 401, and the maximum is 421, a geocoding algorithm would locate address 411 at the midpoint of the street segment.
Try one of these geocoding services for your address. Then compare the experience, and the result, with Google Maps, launched in 2005. Apply what we've discussed in this chapter to try to explain inaccuracies in your results, if any.
Two characteristics of MAF/TIGER data, address range attributes and explicit topology, make them, and derivative products, valuable in many contexts. Consequently, firms like NAVTEQ and Tele Atlas (now owned by TomTom) emerged to provide data with similar characteristics as MAF/TIGER, but which are more up-to-date, more detailed, and include additional feature classes. The purpose of the next section is to sketch some of the applications of data similar to MAF/TIGER data beyond the Census Bureau.
A February 2006 article by Peter Valdes-Dapena in CNNMoney.com describes the work of two NAVTEQ employees. See the link above or search on "where those driving directions really come from."
Geocoded addresses allow governments and businesses to map where their constituents and customers live and work. Federal, state, and local government agencies know where their constituents live by virtue of censuses, as well as applications for licenses and registrations. Banks, credit card companies, and telecommunications firms are also rich in address-referenced customer data, including purchasing behaviors. Private businesses and services must be more resourceful.
Some retail operations, for example, request addresses or ZIP Codes from customers, or capture address data from checks. Discount and purchasing club cards allow retailers to directly match purchasing behaviors with addresses. Customer addresses can also be harvested from automobile license plates. Business owners pay to record license plate numbers of cars parked in their parking lots or in those of their competitors. Addresses of registered owners can be purchased from organizations that acquire motor vehicle records from state departments of transportation.
Businesses with access to address-referenced customer data, vector street data attributed with address ranges, and GIS software and expertise, can define and analyze the trade areas within which most of their customers live and work. Companies can also focus direct mail advertising campaigns on their own trade areas, or their competitors'. Furthermore, GIS can be used to analyze the socio-economic characteristics of the population within trade areas, enabling businesses to make sure that the products and services they offer meet the needs and preferences of target populations.
Politicians use the same tools to target appearances and campaign promotions.
Check out the geocoding system maintained by the Federal Financial Institution's Examination Council. The FFIEC Geocoding system lets users enter a street address and get a census demographic report or a street map (Using Tele Atlas data). The system is intended for use by financial institutions that are covered by the Home Mortgage Disclosure Act (HMDA) and Community Reinvestment Act (CRA) to meet their reporting obligation.
Operations such as mail and package delivery, food and beverage distribution, and emergency medical services need to know not only where their customers are located, but how to deliver products and services to those locations as efficiently as possible. Geographic data products like TIGER/Line Shapefiles are valuable to analysts responsible for prescribing the most efficient delivery routes. The larger and more complex the service areas of such organizations, the more incentive they have to automate their routing procedures.
In its simplest form, routing involves finding the shortest path through a network from an origin to a destination. Although shortest path algorithms were originally implemented in raster frameworks, transportation networks are now typically represented with vector feature data, like TIGER/Line Shapefiles. Street segments are represented as digital line segments each formed by two points, a "start" node and an "end" node. If the nodes are specified within geographic or plane coordinate systems, the distance between them can be calculated readily. Routing procedures sum the lengths of every plausible sequence of line segments that begin and end at the specified locations. The sequence of segments associated with the smallest sum represents the shortest route.
To compare various possible sequences of segments, the data must indicate which line segment follows immediately after another line segment. In other words, the procedure needs to know about the connectivity of features. As discussed earlier, connectivity is an example of a topological relationship. If topology is not encoded in the data product, it can be calculated by the GIS software in which the procedure is coded.
Several online travel planning services, including MapQuest.com and Google Maps, provide routing capabilities. Both take origin and destination addresses as input, and produce optimal routes as output. These services are based on vector feature databases in which street segments are attributed with address ranges, as well as with other data that describe the type and conditions of the roads they represent.
The shortest route is not always the best. In the context of emergency medical services, for example, the fastest route is preferred, even if it entails longer distances than others. To determine fastest routes, additional attribute data must be encoded, such as speed limits, traffic volumes, one way streets, and other characteristics.
Then there are routing problems that involve multiple destinations--a complex special case of routing called the traveling salesman problem. School bus dispatchers, mail and package delivery service managers, and food and beverage distributors all seek to minimize the transportation costs involved in servicing multiple, dispersed destinations. As the number of destinations and the costs of travel increase, the high cost of purchasing up-to-date, properly attributed network data becomes easier to justify.
The Georgia Institute of Technology publishes an extensive collection of resources about the Traveling Salesman Problem. Go to this site: http://www.gatech.edu/ and type traveling salesman in the Search slot.
The need to redraw voting district boundaries every ten years was one of the motivations that led the Census Bureau to create its MAF/TIGER database. Like voting districts, many other kinds of service area boundaries need to be revised periodically. School districts are a good example. The state of Massachusetts, for instance, has adopted school districting laws that are similar in effect to the constitutional criteria used to guide congressional redistricting. The Framingham (Massachusetts) School District's Racial Balance Policy once stated that "each elementary and middle school shall enroll a student body that is racially balanced. ... each student body shall include a percentage of minority student, which reflects the system-wide percentage of minority students, plus or minus ten percent. ... The racial balance required by this policy shall be established by redrawing school enrollment areas" (Framingham Public Schools 1998). And bus routes must be redrawn as enrollment area boundaries change.
The Charlotte-Mecklenberg (North Carolina) public school district also used racial balance as a districting criterion (although its policy was subsequently challenged in court). Charlotte-Mecklenberg consists of 133 schools, attended by over 100,000 students, about one-third of whom ride a bus to school every day. District managers are responsible for routing 3,600 bus routes, traveling a total of 82,000 daily miles. A staff of eight routinely uses GIS to manage these tasks. GIS could not be used unless up-to-date, appropriately attributed, and topologically encoded data were available.
Another example of service area analysis is provided by the City of Beaverton, Oregon. In 1997, Beaverton officials realized that 25 percent of the volume of solid waste that was hauled away to landfills consisted of yard waste, such as grass clippings and leaves. Beaverton decided to establish a yard waste recycling program, but it knew that the program would not be successful if residents found it inconvenient to participate. A GIS procedure called allocation was used to partition Beaverton's street network into service areas that minimized the drive time from residents' homes to recycling facilities. Allocation procedures require vector-format data that includes the features, attributes, and topology necessary to calculate travel times from all residences to the nearest facility.
Naturally, private businesses concerned with delivering products and services are keenly interested in service area delineation. The screen capture above shows two trade areas surrounding a retail store location ("Seattle Downtown") in a network database.
Former student Saskia Cohick (Winter 2006), who was then GIS Director for Tioga County, Pennsylvania, contributed another service area problem: "This is a topic that local governments are starting to deal with ... To become Phase 2 wireless capable (that is, capable of finding a cell phone location from a 911 call center within 200 feet of the actual location), county call centers must have a layer called ESZs (Emergency Service Zones). This layer will tell the dispatcher who to send to the emergency (police, fire, medical, etc). The larger problem is to reach an agreement between four fire companies (for example) as to where they do or do not respond."
To fulfill its mission of being the preeminent producer of attribute data about the population and economy of the United States, the U.S. Census Bureau also became an innovative producer of digital geographic data. The Bureau designed its MAF/TIGER database to support automatic geocoding of address-referenced census data, as well as automatic data quality control procedures. The key characteristics of TIGER/Line Shapefiles, including the use of vector features to represent geographic entities, and address range attributes to enable address geocoding, are now common features of proprietary geographic databases used for trade area analysis, districting, routing, and allocation.