Friday, November 7, 2014

Sand Mine Suitability Project: Data Normalization, Geocoding, and Error Assessment

Goals and Objectives


The goal of this exercise was to become familiar with the process of geocoding non-normalized data, and combining it with data from sources that have been geocoded differently. We were assigned to geocode addresses of sand mines, merge our results with results from other students, and compare.

General Methods


We were each assigned a number of sand mines from an excel table to geocode. The first issue was that the addresses we were given were not normalized. Some were in WI PLSS format, and others already had street addresses. To facilitate the geocoding process, I used a PLSS finder online to find a street address for each of the mines that I was assigned. (See below for pre-normalization, and post-normalization tables). Once I had all of my street addresses, I began geocoding, using ArcMap's geocoding tools. While the tool matched some of my addresses automatically, there were a number of them that I used the "Pick address from map" tool to locate. This involved zooming to their location using PLSS quarter quarter section data. Since this data was located on our University's database server, I had to connect in order to use it. After I was connected, I used the DNR's roads dataset, and PLSS data to locate my mines, and match them manually. Once they were all properly geocoded, I exported my results as a shapefile, and put it in a folder that included all of the students' geocoded mine shapefiles, some of which contained the same as mines that I had been assigned. The next step was to merge all of these shapefiles. However, some of the students' files were corrupted, or had been compiled differently, so merging the files was a challenge. I had to merge a few files at a time and repeat until I had a usable dataset. Next, I used the Project tool to project my data to a Wisconsin state system projection. I then queried for all geocoded mines that matched my own, and I used a point distance tool to generate a table of the distances between my points, and other students' points. The problem here was that this generated the distances between each of my points and each of theirs, while I was really only interested in the nearest points from other students to each of mine. I summarized my point distance table on my mine-id field, and included minimum distance to generate a table of my mines and how far the nearest mine mapped by other students was.

Results

This image shows some of my mine addresses. Note the ADDRESS_NORMALIZED
field that I added alongside the Address field after the normalization process




This image shows a table of the minimum distance (in ft) that my mines (INPUT_FID) were
from mines geocoded by other students. 

Discussion

As in any geospatial analysis, there were a number of sources of error in this project. Of course, there was inherent error in the map projection and scale etc, but more important was the operational error associated with data compilation and geocoding. We each had to geocode our mine addresses based on PLSS data. Since many of the mines didn't have an associated street address, we had to search for one and assign it one. This leads to various discrepancies between our datasets, resulting in some values in the table above being quite high. In the process of geocoding, many addresses were automatically matched to their normalized street addresses, but many required further analysis. That meant picking a point from the map for which geocode addresses tool would assign an address. This resulted in high operational error, as depending on where I or another student clicked, the address would be assigned differently. The points that are actually correct are likely ones that originally had a street address, and that the geocoding tool automatically assigned a point correctly to that address. These are points that likely have low distance between them in the above table.  

Conclusion

This lab was a good introduction to the process of data normalization and geocoding. In GIS data often comes from different sources, and there are often discrepancies between datasets that must be addressed before analysis. Having to normalize data that might be in a number of formats is a process that I should be familiar with. It is also good to be familiar with the geocoding process, so I am able to generate spatial data from tabular address data. 

No comments:

Post a Comment