Critical to maintaining spatial databases is the job of ensuring the integrity of your spatial and nonspatial data. There are several behaviors that you can impose upon your data that will give you access to robust methods of assessing the integrity of existing data and of ensuring the integrity of newly created data. This lesson will expose you to some of those techniques.
Note that there are two sets of pages in this lesson; the first written for ArcGIS Pro and the second for ArcGIS Desktop (ArcMap). I strongly encourage you to work through the ArcGIS Pro pages since it is the more modern of the two packages. If you do that, there is no need to also work through the ArcMap pages, unless you're particularly interested. But if you have good reason to avoid Pro, such as having no experience with it or working in an environment where you're required to use ArcMap instead, then you have that option.
The point has been made that the best practice is to think through the implementation of your database design before you implement it. That philosophy certainly holds for what we will be covering in this lesson. However, in order to illustrate the concepts that we want to cover, it is advantageous to already have some existing data, both spatial and attribute. The scope of this lesson does not include a lot of tracing of spatial features and entering of attribute values. So, we will be applying the various techniques to existing data.
At the successful completion of this lesson, students should be able to:
Conversation and comments in this course will take place within the course discussion forums.
Lesson 5 is one week in length. See the Canvas Calendar for specific due dates. To finish this lesson, you must complete the activities listed below:
In Geography 484: GIS Database Development, you were introduced to attribute domains; however, it is possible to go through that course without having actually implemented them. So we will go over Coded Value Domains and Range Domains in the following section. Some of the material will be review for some of you, but even if you did work with attribute domains in Geography 484, you are apt to be exposed to functionality associated with them that was not covered in that course.
Follow this link to download the data for this part of the lesson: AttributeDomains.zip [1]
The zip archive contains two Esri File Geodatabases:
A coded value domain allows you to choose from a list of values that are allowed in a field. A range domain allows you to specify a valid range for values in a numeric field. Both types of attribute domain thus provide means of enforcing data integrity.
Additional behaviors that can accompany attribute domains allow us to define what values get assigned to the field when a feature is split into two or when two features are merged into one. According to the Esri documentation, "...merge policies are not evaluated when merging features in the Editor. However, developers can take advantage of merge policies when writing their own merge implementations." The scope of this lesson does not include developer-level exercises, so we will visit only the use of split policies.
For the homework deliverable for this part of the lesson, I am going to have you create two attribute domains to be assigned to existing fields in a feature class attribute table.
The data you will use is found in the AttributeDomainsProject.gdb file geodatabase. That geodatabase contains a single feature class named BuildingFootprints.
Because attribute domains are stored in the geodatabase, they can be used by any dataset: feature class, table, subtype (covered in the next part of the lesson), within the geodatabase.
You can now move on to the Subtypes part of the lesson.
Credit for all screenshots: © Penn State is licensed under CC BY-NC-SA 4.0 [5]
In a situation where we have a lot of similar features, roads for example, that we want to easily assign separate attribute values to and symbolize differently, etc., we can designate them as what Esri calls subtypes. The alternative for managing such data and information would be to create separate feature classes for, in this example, each type of road. This practice can affect the performance and ease of use of the database.
Subtypes can be created for spatial features in a feature class or for objects (records) in a nonspatial, stand alone table. Subtypes can be created based on an existing field or a new field is created to hold the subtype values.
A subtype is defined by a code and a description. The codes are stored in a field in the attribute table and must be either Short or Long Integer values. Each code value is equated to a descriptive category. Sounds a lot like a coded value domain but, as you will see, they are not the same.
In our exercise example, we will be creating road subtypes from integer values that represent six different road type categories.
Follow this link to download the data for this part of the lesson: Subtypes.zip [6]
The zip archive contains two Esri file geodatabases:
We are going to place roads in subtype categories according to the values in an existing field that contains integer values that represent six road type categories.
So instead of creating separate feature classes for the different road types, we put the road line features into subtypes which gave us the ability to specify unique default values for each new feature of a different road type category, and to assign, or not, certain attribute domains to each category.
As was mentioned above, subtypes can be created for objects (records) in a nonspatial table. In this homework project, that is what you will do.
Investigate the field structure and attribute data in the Encounters table residing in the SubtypesProject.gdb geodatabase. In addition to the OBJECTID field that is automatically created by the GIS, you will find the following five fields: ID, Encounter, Category, EducationLevel, and BloodAlcohol.
The values of 1 and 2 in the Encounter field represent Distant Encounters and Close Encounters respectively.
The descriptions of the values in the Category field are as follows:
The values in the EducationLevel field are the number of years of school attended.
The values in the BloodAlcohol field are blood alcohol content measures. (Wikipedia Blood Alcohol Content Page [8]/ http://en.wikipedia.org/wiki/Blood_alcohol_content)
Here is what I want you to do for this project:
A geodatabase topology is another construct that is stored within a geodatabase and gives us added control over assessing and maintaining the integrity of our spatial data.
Follow this link to download the data for this part of the lesson: Topology.zip [9]
The zip archive contains the following:
An Esri File Geodatabase: geodatabasetopol.gdb
A zip archive: TopologyProject.zip
A geodatabase topology provides a robust way of defining topological relationships among spatial features. It does so by analyzing coordinate locations of feature vertices both among features within a feature class and between features in multiple feature classes taking part in the topology. Therefore, it is not only important that all of the feature classes participating in a geodatabase topology be in the same coordinate system, but also that the measurement precision defined for each feature class be the same. To assure that this is the case, all feature classes that take part in a geodatabase topology must reside within what is known as a Feature Dataset. When a feature dataset is created, the coordinate system and precision are defined, and any subsequent feature class that is added to the feature dataset inherits that coordinate system and precision. In the exercise that follows, you will see that the precision is controlled by the Tolerance and Resolution settings. I encourage you to read more about these topics in the Topology in ArcGIS [10] entry in the ArcGIS Pro documentation.
A geodatabase topology is governed by topology "rules" that the user specifies, and those rules are based on knowledge of the relationships between and among the features that will be taking part in the topology. So, the onus is on the user to understand the data being created/edited in order that appropriate rules are specified.
The manifestation of a geodatabase topology is as a layer in the feature dataset. As such, the topology errors that it contains are symbolized just as are the features in any other map layer in ArcGIS. In the documentation, the Validate and fix geodatabase topology [11] topic provides an overview of the error fixing process along with links to the rules available for points, polylines, and polygons. If you are inclined to adorn your walls with GIS help guides, you may want to print the topology rules poster [12]. Whether you print it out or not. it offers a bit more in the way of graphic description and examples of the rules.
Once topology rules have been imposed on the data in a feature dataset. errors are discovered by "validating" the topology. Validation can be done on the entire visible extent of the data or on a smaller specified area. The latter technique allows you to just check an area that you have been editing, rather than the entire dataset. This can save time when the entire dataset is large.
As is mentioned above, one needs to be aware of how the features involved in a geodatabase topology relate to each other in order to be able to define appropriate topology rules to govern the creation of spatial data and aid in discovering errors in existing data. In the following exercise, we will be working with the data depicted in the image below, and I will be the arbitrary source of what is known about the relationships among the features in the four feature classes involved. We will be basing our choices of rules on the following:
With a geodatabase topology in place, there are certain tools at your disposal that enable you to maintain topological relationships when you make edits to spatial data. Here, we contrast editing a shared polygon boundary with and without having a geodatabase topology set up. It is important to realize that the "shared" boundary between two polygons actually is a duplicate set of line segments, each polygon boundary exists as complete and independent from the adjacent polygon. So, when a shared boundary needs to be edited, one must actually edit the boundaries of two polygons.
Because this is not a course in inputting and editing spatial and attribute data, we are choosing to focus on what needs to be done to prepare to implement a geodatabase topology. There will be some errors in spatial data to repair, but it involves going over ground already covered in section A of the lesson.
Unzip the TopologyProject.zip archive. The archive contains a folder named TopologyProject. Within that folder is a geodatabase (TopologyProject.gdb) and a georeferenced image of a map (with ancillary files).
The BuildingFootprints feature class contains some instances of polygon overlap that need to be repaired.
In order for you to accomplish finding and repairing the errors in the BuildingFootprints feature class, you are going to have to create and employ a geodatabase topology. You know from section A how to create a geodatabase topology, and you know that in order to do so, the data in question has to reside within a feature dataset. What we did not explicitly go over in the lesson was how to (1) create a feature dataset, and (2) how to get existing data, in this case the feature class contained in the TopologyProject.gdb, into a feature dataset that we create. But that's what I want you to do. Given that Geography 484 or comparable experience was the prerequisite for taking this course, you should be able to do it.
Once you have accomplished that, proceed to find and correct the overlapping building footprint polygons. All of the offending features will be in the area covered by the included georeferenced map image. You can use it as reference to make sure you are performing the corrections to the polygons correctly.
When you finish, zip up your version of the TopologyProject.gdb and upload it to the Lesson 5 Drop Box.
Credit for all screenshots: © Penn State is licensed under CC BY-NC-SA 4.0 [5]
In Geography 484: GIS Database Development, you were introduced to attribute domains; however, it is possible to go through that course without having actually implemented them. So we will go over Coded Value Domains and Range Domains in the following section. Some of the material will be review for some of you, but even if you did work with attribute domains in Geography 484, you are apt to be exposed to functionality associated with them that was not covered in that course.
Follow this link to download the data for this part of the lesson: AttributeDomains.zip [1]
The zip archive contains two Esri File Geodatabases:
A coded value domain allows you to choose from a list of values that are allowed in a field. A range domain allows you to specify a valid range for values in a numeric field. Both types of attribute domain thus provide means of enforcing data integrity.
Additional behaviors that can accompany attribute domains allow us to define what values get assigned to the field when a feature is split into two or when two features are merged into one. According to the Esri documentation, "...merge policies are not evaluated when merging features in the Editor. However, developers can take advantage of merge policies when writing their own merge implementations." The scope of this lesson does not include developer-level exercises, so we will visit only the use of split policies.
For the homework deliverable for this part of the lesson, I am going to have you create two attribute domains to be assigned to existing fields in a feature class attribute table.
You will use the behavior functionality that the domains give you to find several attribute errors that reside in the dataset.
The data you will use is found in the AttributeDomainsProject.gdb file geodatabase. That geodatabase contains a single feature class named BuildingFootprints.
Because attribute domains are stored in the geodatabase, they can be used by any dataset: feature class, table, subtype (covered in the next part of the lesson), within the geodatabase.
You can now move on to the Subtypes part of the lesson.
In a situation where we have a lot of similar features, roads for example, that we want to easily assign separate attribute values to and symbolize differently, etc., we can designate them as what are called Subtypes. The alternative for managing such data and information would be to create separate feature classes for, in this example, each type of road. This practice can affect the performance and ease of use of the database.
Subtypes can be created for spatial features in a feature class or for objects (records) in a nonspatial, standalone table. Subtypes can be created based on an existing field, or a new field is created to hold the subtype values.
A subtype is defined by a code and a description. The codes are stored in a field in the attribute table and must be either Short or Long Integer values. Each code value is equated to a descriptive category. Sounds a lot like a coded value domain but, as you will see, they are not the same.
In our exercise example, we will be creating road subtypes from integer values that represent six different road type categories.
Follow this link to download the data for this part of the lesson: Subtypes.zip [6]
The zip archive contains two Esri File Geodatabases:
We are going to place roads in subtype categories according to the values in an existing field that contains integer values that represent six road type categories.
So instead of creating separate feature classes for the different road types, we put the road line features into subtypes which gave us the ability to specify unique default values for each new feature of a different road type category, and to assign, or not, certain attribute domains to each category.
As was mentioned above, subtypes can be created for objects (records) in a nonspatial table. In this homework project, that is what you will do.
Investigate the field structure and attribute data in the Encounters table residing in the SubtypesProject.gdb geodatabase. In addition to the OBJECTID field that is automatically created by the GIS, you will find the following five fields: ID, Encounter, Category, EducationLevel, and BloodAlcohol.
The values of 1 and 2 in the Encounter field represent Distant Encounters and Close Encounters respectively.
The descriptions of the values in the Category field are as follows:
The values in the EducationLevel field are the number of years of school attended.
The values in the BloodAlcohol field are blood alcohol content measures. (Wikipedia Blood Alcohol Content Page [8]/ http://en.wikipedia.org/wiki/Blood_alcohol_content)
Here is what I want you to do for this project:
Credit for all screenshots: © Penn State is licensed under CC BY-NC-SA 4.0 [5]
A geodatabase topology is another construct that is stored within a geodatabase and gives us added control over assessing and maintaining the integrity of our spatial data.
Follow this link to download the data for this part of the lesson: Topology_ArcMap.zip [13]
The zip archive contains the following:
An Esri Personal Geodatabase: geodatabasetopol.mdb
A zip archive: TopologyProject.zip
A geodatabase topology provides a robust way of defining topological relationships among spatial features. It does so by analyzing coordinate locations of feature vertices, both among features within a feature class and between features in multiple feature classes taking part in the topology. Therefore, it is not only important that all of the feature classes participating in a geodatabase topology be in the same coordinate system, but also that the measurement precision defined for each feature class be the same. To assure that this is the case, all feature classes that take part in a geodatabase topology must reside within what is known as a Feature Dataset. When a feature dataset is created, the coordinate system and precision are defined, and any subsequent feature class that is added to the feature dataset inherits that coordinate system and precision. In the exercise that follows, you will see that the precision is controlled by the Tolerance and Resolution settings. I encourage you to read more about these topics in the Desktop Help when you have time. Search on "topology rules", and then select the "Topology in ArcGIS" topic entry.
A geodatabase topology is governed by topology "rules" that the user specifies, and those rules are based on knowledge of the relationships between and among the features that will be taking part in the topology. So, the onus is on the user to understand the data being created/edited in order that appropriate rules are specified. Rules can be added to and taken away from a topology as long as the data is not part of an active ArcMap editing session.
The manifestation of a geodatabase topology is as a layer in the feature dataset. As such, the topology errors that it contains are symbolized just as are the features in any other map layer in ArcMap. In the Desktop Help, search on "topology rules", and then select the "Geodatabase topology rules and topology error fixes" topic entry. This will take you to a compilation of the types of rules that can be defined along with suggestions for ways to fix violations of each rule and a description of how the errors are symbolized in the map data frame, as Points, Lines, and Areas. If you are inclined to adorn your walls with GIS help guides, you will find a link to a topology rules poster in the opening passage to the Help page I just referred you to. Whether you print it out or not. it offers a bit more in the way of graphic description and examples of the rules.
Once topology rules have been imposed on the data in a feature dataset. errors are discovered by "validating" the topology. Validation can be done on the entire visible extent of the data or on a smaller specified area. The latter technique allows you to just check an area that you have been editing, rather than the entire dataset. This can save time when the entire dataset is large.
As is mentioned above, one needs to be aware of how the features involved in a geodatabase topology relate to each other in order to be able to define appropriate topology rules to govern the creation of spatial data and aid in discovering errors in existing data. In the following exercise, we will be working with the data depicted in the image below, and I will be the arbitrary source of what is known about the relationships among the features in the four feature classes involved. We will be basing our choices of rules on the following:
With a geodatabase topology in place, there are certain tools at your disposal that enable you to maintain topological relationships when you make edits to spatial data. Here, we contrast editing a shared polygon boundary with and without having a geodatabase topology set up. It is important to realize that the "shared" boundary between two polygons actually is a duplicate set of line segments, each polygon boundary exists as complete and independent from the adjacent polygon. So when a shared boundary needs to be edited, one must actually edit the boundaries of two polygons.
Because this is not a course in inputting and editing spatial and attribute data, we are choosing to focus on what needs to be done to prepare to implement a geodatabase topology. There will be some errors in spatial data to repair, but it involves going over ground already covered in section A of the lesson.
Unzip the TopologyProject.zip archive. The archive contains a folder named TopologyProject. Within that folder is a map document (TopologyProject.mxd file), a geodatabase (TopologyProject.gdb) and a georeferenced image of a map (with ancillary files). When you unzip the archive, keep the folder and its contents intact. The map document is set to maintain relative paths to the data that it points to, so after you have extracted the folder and you navigate into it and double-click on the TopologyProject.mxd, the map image and the single feature class contained in the geodatabase should be present in the resulting ArcMap session.
The BuildingFootprints feature class contains some instances of polygon overlap that need to be repaired.
In order for you to accomplish finding and repairing the errors in the BuildingFootprints feature class, you are going to have to create and employ a geodatabase topology. You know from section A how to create a geodatabase topology, and you know that in order to do so, the data in question has to reside within a feature dataset. What we did not explicitly go over in the lesson was how to (1) create a feature dataset, and (2) how to get existing data, in this case the feature class contained in the TopologyProject.gdb, into a feature dataset that we create. But that's what I want you to do. Given that Geography 484 or comparable experience was the prerequisite for taking this course, you should be able to do it.
Once you have accomplished that, proceed to find and correct the overlapping building footprint polygons. All of the offending features will be in the area covered by the included georeferenced map image. You can use it as a reference to make sure you are performing the corrections to the polygons correctly.
When you finish, zip up your version of the TopologyProject.gdb and upload it to the Lesson 5 Drop Box.
Credit for all screenshots: © Penn State is licensed under CC BY-NC-SA 4.0 [5]
This project is one week in length. Please refer to the Canvas Calendar for the due date.
The various Project deliverables were described along the way throughout the lesson. Below is a list summarizing what you need to submit.
Links
[1] https://www.e-education.psu.edu/spatialdb/sites/www.e-education.psu.edu.spatialdb/files/AttributeDomains.zip
[2] http://desktop.arcgis.com/en/arcmap/latest/tools/data-management-toolbox/table-to-domain.htm
[3] http://desktop.arcgis.com/en/arcmap/latest/tools/data-management-toolbox/an-overview-of-the-domains-toolset.htm
[4] http://www.7-zip.org/
[5] https://creativecommons.org/licenses/by-nc-sa/4.0/
[6] https://www.e-education.psu.edu/spatialdb/sites/www.e-education.psu.edu.spatialdb/files/Subtypes.zip
[7] http://www.ufocasebook.com/Hynek.html
[8] http://en.wikipedia.org/wiki/Blood_alcohol_content
[9] https://www.e-education.psu.edu/spatialdb/sites/www.e-education.psu.edu.spatialdb/files/Topology.zip
[10] https://pro.arcgis.com/en/pro-app/help/data/topologies/topology-in-arcgis.htm
[11] https://pro.arcgis.com/en/pro-app/help/editing/validate-and-fix-geodatabase-topology.htm
[12] https://pro.arcgis.com/en/pro-app/help/editing/pdf/topology_rules_poster.pdf
[13] https://www.e-education.psu.edu/spatialdb/sites/www.e-education.psu.edu.spatialdb/files/Topology_ArcMap.zip