Digital data are stored in computers as files. Often, data are arranged in tabular form. For this reason, data files are often called tables. A database is a collection of tables specifically designed for efficient retrieval and use. Businesses and government agencies that serve large clienteles, such as telecommunications companies, airlines, credit card firms, and banks, rely on extensive databases for their billing, payroll, inventory, and marketing operations. Database management systems (DBMS) are information systems that people use to store, update, and analyze non-geographic databases.
Often, data files are composed of rows and columns. Rows, also known as records, correspond to individual entities, such as a customer account or a city. Columns correspond to the various attributes associated with each individual entity. The attributes stored in the accounts database of a telecommunications company, for example, might include customer names, telephone numbers, addresses, current charges for local calls, long distance calls, taxes, etc.
Geographic data are a special case: records typically correspond with places. Columns represent the attributes of places. The data in the following table, for example, consist of records for Pennsylvania counties. Columns contain selected attributes of each county, including the county's ID code (FIPS code), name (County), and 1980 population (1980 Pop).
FIPS Code | County | 1980 Pop |
---|---|---|
42001 | Adams County | 78274 |
42003 | Allegheny County | 1336449 |
42005 | Armstrong County | 73478 |
42007 | Beaver County | 186093 |
42009 | Bedford County | 47919 |
42011 | Berks County | 336523 |
42013 | Blair County | 130542 |
42015 | Bradford County | 60967 |
42017 | Bucks County | 541174 |
42019 | Butler County | 152013 |
42021 | Cambria County | 163062 |
42023 | Cameron County | 5913 |
42025 | Carbon County | 56846 |
42027 | Centre County | 124812 |
Figure 4.9: The contents of one file in a database
Credit: Department of Geography, The Pennsylvania State University.
The example is a small and very simple file, but many geographic attribute databases are in fact large and complex (the U.S. is made up of over 3,000 counties, almost 50,000 census tracts, about 43,000 five-digit ZIP code areas and many tens of thousands more ZIP+4 code areas). Large databases consist not only of lots of data, but also lots of files. Unlike a spreadsheet, which performs calculations only on data that are present in a single document, database management systems allow users to store data in, and retrieve data from, many separate tables (which might be stored within a single database or perhaps as separate files). For example, suppose an analyst wished to calculate population change for Pennsylvania counties between the 1980 and 1990 censuses. More than likely, 1990 population data would exist in a separate table, like so:
FIPS Code | 1990 Pop |
---|---|
42001 | 84921 |
42003 | 1296037 |
42005 | 73872 |
42007 | 187009 |
42009 | 49322 |
42011 | 352353 |
42013 | 131450 |
42015 | 62352 |
42017 | 578715 |
42019 | 167732 |
42021 | 158500 |
42023 | 5745 |
42025 | 58783 |
42027 | 131489 |
Figure 4.10: Another file in a database. A database management system (DBMS) can relate this file to the prior one illustrated above because they share the list of attributes called "FIPS Code."
Credit: Department of Geography, The Pennsylvania State University.
A database management system (DBMS) can relate this table to the prior one illustrated above because they share the list of attributes called "FIPS Code." If two data table have at least one common attribute (e.g., FIPS Code), a DBMS can combine them in a single new table. The common attribute is called a key, and can be used for associating the individual records in the two tables. In this example, the key was the county FIPS code (FIPS stands for Federal Information Processing Standard), allowing the user to merge both tables into one. The DBMS also allows users to create new data such as the "% Change" attribute in the table below calculated from the 1980 and 1990 population totals that were merged together.
FIPS | County | 1980 | 1990 | % Change |
---|---|---|---|---|
42001 | Adams | 78274 | 84921 | 8.5 |
42003 | Allegheny | 1336449 | 1296037 | -3 |
42005 | Armstrong | 73478 | 73872 | 0.5 |
42007 | Beaver | 186093 | 187009 | 0.5 |
42009 | Bedford | 47919 | 49322 | 2.9 |
42011 | Berks | 336523 | 352353 | 4.7 |
42013 | Blair | 130542 | 131450 | 0.7 |
42015 | Bradford | 60967 | 62352 | 2.3 |
42017 | Bucks | 541174 | 578715 | 6.9 |
42019 | Butler | 152013 | 167732 | 10.3 |
42021 | Cambria | 163062 | 158500 | -2.8 |
42023 | Cameron | 5913 | 5745 | -2.8 |
42025 | Carbon | 56846 | 58783 | 3.4 |
42027 | Centre | 124812 | 131489 | 5.3 |
Figure 4.11: A new file produced from the prior two files as a result of two database operations. One operation merged the contents of the two files without redundancy. A second operation produced a new attribute--"% Change"--dividing the difference between "1990 Pop" and "1980 Pop" by "1980 Pop" and expressing the result as a percentage.
Credit: Department of Geography, The Pennsylvania State University.
Above, a new table is produced from the prior two tables as a result of two database operations. One operation merged the contents of the two tables. A second operation produced a new attribute--"% Change"--dividing the difference between "1990 Pop" and "1980 Pop" by "1980 Pop" and expressing the result as a percentage.
Database management systems provide a simple but powerful language that makes data retrieval and manipulation easy. These data can be retrieved and manipulated based upon user specified criteria, enabling users to select data in response to particular questions. A question that is addressed to a database through a DBMS is called a query. In addition, DBMS are valuable because they provide secure means of storing and updating data. Database administrators can also protect files so that only authorized users can make changes and provide transaction management functions that allow multiple users to edit the database simultaneously.
Database queries include basic set operations, including union, intersection, and difference. The product of a union of two or more data files is a single file that includes all records and attributes for features that appear in one file or the other, with records in common merged to avoid repetition. For example, if one wanted to find what both the coyote and red fox could prey upon, you can perform a union by combining the entire area that encompasses the territory of the coyote and the red fox.
An intersection produces a data file that contains only records that are present in all files. This is the area where both animals may compete for food, or where they overlap in territory. A difference operation produces a data file that eliminates records that appear in both original files. The difference of the red fox territory and the coyote territory produces places in which the predation may be lower and the stress of competition less.
Try This
Draw Venn diagrams--intersecting circles that show relationships between two or more entities--to illustrate the three operations. Then compare your sketch to this one.) As mentioned earlier in the chapter, all operations that involve multiple data files rely on the fact that all files contain a common key. The key allows the database system to relate the separate files. Databases that contain numerous files that share one or more keys are called relational databases. Database systems that enable users to produce information from relational databases are called relational database management systems. In the example above, if data on foxes and coyotes were aggregated to watersheds, then the watershed specification could act as the geographic key for connecting the two sets of data.
4.3.1 Available Tools
Numerous tools exist to help users perform database management operations. Microsoft Excel and Access allow users to retrieve specific records, manipulate the records, and create new user content. ESRI’s ArcGIS allows users to query and manipulate files, but also map the geographic database files in order to find interesting spatial patterns and processes in graphic form.