Feature extraction and machine learning techniques for identifying historic urban environmental hazards

Tollefson, J., Frickel, S., and Restrepo, M. I.

Under review

US cities contain unknown numbers of undocumented sites that are legacies of the “manufactured gas” industry that dominated energy production during the late-19th and early-20th centuries. While many of these unidentified sites likely contain significant levels of highly toxic and biologically persistent contamination, locating them remains a significant challenge. We propose a new method to efficiently identify manufactured gas production, storage, and distribution infrastructure in bulk by applying feature extraction and machine learning techniques to publicly-available scans of historic Sanborn fire insurance maps. Our approach, which relies on a convolutional neural network to classify circular map features, increases the rate of MGP identification 20-fold compared to unaided visual coding while maintaining a recall rate of nearly 90%.