GEOG 480
Exploring Imagery and Elevation Data in GIS Applications

Validation of Elevation Data

PrintPrint

The approach and methods for vertical accuracy testing of terrain data are very similar to those presented above for orthorectified imagery. Elevations are measured relative to a vertical datum, and the vertical datum itself is an approximation of something ideal such as “mean sea level,” which cannot be exactly and completely known, because it is by definition an average. We cannot say absolutely that a particular elevation is accurate to within 1 foot, 1 inch, or 1 millimeter of its true value. However, we can express the level of confidence we have in a measurement based on a framework of statistical testing. Based on a sample of independent elevation check points, we can say we have a level of confidence that any other point in the entire dataset is within the stated tolerance of its “true” value expressed relative to one vertical datum or another.

In this course, you have been introduced to the distinctly different technologies for capturing terrain data that have come into maturity in recent decades: photogrammetry, lidar, and IFSAR. Because these technologies are so new to both data providers and end users, the topic of QA/QC of terrain data has been the subject of much debate and study. Not only is there a high level of interest in applications that make use of terrain data, there is also a pervasive need to understand the strengths and weaknesses of each technology in order to make good investment decisions in equipment, software, and data.

As discussed above, the QA/QC process gives us insight into the types of errors and artifacts that affect terrain data, due either to the sensor or to characteristics of the target surface. With respect to the terrain data, there was concern from the start that vertical accuracy would vary within a single dataset, based on the type of terrain and land cover being mapped. In other words, there was some recognition that accuracy itself was a spatial variable. While this is undoubtedly true of most spatial datasets, including orthorectified imagery, the discussion and debate about methods of accuracy assessment and reporting have been highly focused on terrain, and it’s safe to say that there will be significant refinements and developments occurring in the next decade.

In the final content section of this lesson, you will see that the current standards for elevation data accuracy assessment and reporting are actually called guidelines, and that there are a number of unanswered questions on the table that require further research. Overall, this represents progress, because there is widespread recognition that the quantification of error within a dataset is more complex than a simple calculation of RMSE.

Data Validation

Quality control and assurance for terrain models comprise the three categories, similar to those introduced previously for orthorectified imagery:

  • Data integrity, which includes completeness of coverage at the user-defined post-spacing or density, valid files in the user-defined format, and accurate georeferencing information.
  • Spatial accuracy, which for terrain models is primarily vertical accuracy; however, for some products such as breaklines, horizontal accuracy is also relevant.
  • Visual inspection for artifacts and anomalies, which for terrain data, is not only aesthetic, but affects accuracy of the terrain surface.

Terrain data come in many different forms (DEM, DTM, DSM, TIN, breaklines, etc.) and formats. It is important to ensure that the user has specified this clearly before production begins, as transforming from one format to another after production is time-consuming and may introduce undesirable interpolation errors into the data itself. It is best to provide users with a small sample area as soon as possible, before beginning full production, to allow them the chance to use the sample data on their own systems and with their own software.

Quantitative Assessment

As with horizontal data and checkpoint, the reference elevation data ought to be at least three times more accurate than the sample data. The root-mean-square error (RMSE) as calculated between the sample dataset and the independent source is converted into a statement of vertical accuracy at an established confidence level, normally 95 percent. Because elevation is a one-dimensional variable, the 95% confidence level is equivalent to the RMSE multiplied by 1.96. A NSSDA-compliant accuracy statement accompanying a terrain model deliverable would be “Tested ____ (meters, feet) vertical accuracy at 95% confidence level”, and the numerical value supplied is RMSE * 1.9600. This statement of accuracy assumes that no systematic errors or biases are present in the data and that the individual checkpoint errors follow a normal distribution.

One of the biggest potential customers for terrain data in the United States is FEMA, in particular the national floodplain mapping program. As topographic lidar was emerging as a powerful terrain mapping tool in the mid to late 1990s, one of FEMA’s most pressing questions was “how does it perform in the different land cover types that characterize the floodplain?” This question and FEMA’s potential need for accurate elevation data nationwide drove the development of guidelines and specifications for lidar acquisition, processing, QA/QC, and accuracy testing. The FEMA guidelines required testing and reporting against independent check points in representative land cover types. The most common land cover types identified for terrain model accuracy assessment purposes are: open ground, weeds and crops, scrub and shrub, forest and urban.

The FEMA guidelines are presented in more depth later in the lesson; for the moment, it is relevant to point out that the early testing of lidar data, according to these guidelines, pointed out several important facts that affect our approach to quantitative accuracy assessment of terrain data. First and foremost, it was discovered that errors in lidar-derived terrain datasets do not follow a normal distribution, except over bare ground. In areas covered by any sort of vegetation, the tendency will be for lidar (and for radar as well) to yield elevations above the ground due to returns off the canopy. In built-up areas, there will be many lidar returns on objects above the ground, which may not all be removed from the bare earth terrain model, again causing an asymmetric error distribution with more above-ground errors than below-ground errors. On the contrary, lidar tends to measure elevations a bit below the ground on the dark asphalt surfaces that are common to roadways and urban areas. When one begins to study the error distribution for an entire dataset in detail, it is obvious that accuracy not only varies within the dataset due to variation in land cover, but it also deviates from a normal error distribution in particular ways depending on the slope, roughness, and composition of the surface. One can easily assume that radar will have its own set of similar issues.

In recognition of the fact that errors in lidar-derived terrain models are often not appropriately modeled by a Gaussian distribution, a nonparametric testing method, based on the 95th percentile, was proposed and implemented in the National Digital Elevation Program Guidelines. According to these guidelines (which are the currently-accepted working standard for most lidar projects in the US, including those conducted for FEMA), fundamental vertical accuracy is measured in bare, open terrain and reported at the 95% confidence level as a function of vertical RMSE; in other land cover types, the supplemental or consolidated vertical accuracy is measured and reported according to the 95th percentile method. Both Maune (2007) and the NDEP Guidelines give detailed instructions for the computation of these quantities. Links to those documents are provided on page 8 of this lesson.

A sample vertical accuracy assessment report, compiled by an independent contractor for the Pennsylvania statewide lidar program, PAMAP, illustrates the calculation and reporting of quantitative accuracy assessment results.

Qualitative Assessment

The final step in product acceptance is the qualitative assessment. Various 3D visualization techniques are used to view the terrain surface and examine it for artifacts, stray vegetation or buildings and the like. Water bodies tend to pose special problems and generally require some sort of manual editing during data production, so lakes, rivers, and shorelines should be examined to ensure that as an elevation surface, they are represented as being flat. The elevation used over water bodies is almost never an accurate representation of the height of the water surface in reality, because most remote sensing techniques do not directly measure water heights reliably. In a terrain model product, the elevation of a water body is usually filled in using the mean elevation of the shoreline.

Breaklines are normally a supplemental deliverable accompanying another type of terrain model (DEM, DTM, or DSM). The most common way to assess the quality and accuracy of breaklines is superimposition on the terrain model in a 3-dimensional view. Contours are usually generated from another type of terrain model, so they are usually not checked directly for vertical accuracy. They should be checked to ensure that they do not cross, touch or contain gaps.

A sample QA/QC report, compiled by an independent contractor for the Pennsylvania statewide lidar program, PAMAP, provides many good examples of the types of artifacts found in visual inspection of a lidar-derived terrain dataset. It is difficult to automate identification and correction of these artifacts; therefore, the independent review and final data editing is usually an interactive process involving the data producer, the independent reviewer, and the data purchaser.