American Association of Geographers

Leveraging Random Forest on Open-Source, Big Data to Calculate Number of Building Floors

Topics:

Keywords: Big Data, Urban Planning, Machine Learning, Random Forest
Abstract Type: Paper Abstract

Authors:

Clinton William Stipek, Oak Ridge National Laboratory

Ty Frazier, Oak Ridge National Laboratory

Debraj De, Oak Ridge National Laboratory

Brian Wong, Duke Universityf

Abstract

Open-source platforms are an efficient way to gain access to readily available data, yet a high percentage of these data contain null values, or high dimensionality. While null values can be engineered using a variety of methods such as mean, median, etc., these methods can decrease the integrity of the underlying data and reduce the accuracy of any model that leverages the data. New building height models, which could revolutionize population modeling if readily available, are now scaled to develop city building height estimates. This development has increased the importance of readily available ground truth data to assess their effectiveness. However, one of the limiting factors of large-scale building height models is that ground truthing is expensive, or there is a limit to the amount of data available, thus restraining the spatial scale the model can be applied to. Therefore, if the underlying data is inaccurately predicting building height, it can lead to imprecise population metrics. Our research bridges this gap by utilizing a Random Forest approach to accurately infer number of floors from publicly available, large-scale data that has a high percentage of null values. This research has the potential to be leveraged as a ground truth dataset for ingestion to generate an accurate, country scale building height model that contains the number of floors.

PLATINUM	GOLD	SILVER	BRONZE

Leveraging Random Forest on Open-Source, Big Data to Calculate Number of Building Floors

Abstract

Leveraging Random Forest on Open-Source, Big Data to Calculate Number of Building Floors

Category