Leveraging Random Forest on Open-Source, Big Data to Calculate Number of Building Floors
Topics:
Keywords: Big Data, Urban Planning, Machine Learning, Random Forest
Abstract Type: Paper Abstract
Authors:
Clinton William Stipek, Oak Ridge National Laboratory
Ty Frazier, Oak Ridge National Laboratory
Debraj De, Oak Ridge National Laboratory
Brian Wong, Duke Universityf
,
,
,
,
,
,
Abstract
Open-source platforms are an efficient way to gain access to readily available data, yet a high percentage of these data contain null values, or high dimensionality. While null values can be engineered using a variety of methods such as mean, median, etc., these methods can decrease the integrity of the underlying data and reduce the accuracy of any model that leverages the data. New building height models, which could revolutionize population modeling if readily available, are now scaled to develop city building height estimates. This development has increased the importance of readily available ground truth data to assess their effectiveness. However, one of the limiting factors of large-scale building height models is that ground truthing is expensive, or there is a limit to the amount of data available, thus restraining the spatial scale the model can be applied to. Therefore, if the underlying data is inaccurately predicting building height, it can lead to imprecise population metrics. Our research bridges this gap by utilizing a Random Forest approach to accurately infer number of floors from publicly available, large-scale data that has a high percentage of null values. This research has the potential to be leveraged as a ground truth dataset for ingestion to generate an accurate, country scale building height model that contains the number of floors.
Leveraging Random Forest on Open-Source, Big Data to Calculate Number of Building Floors
Category
Paper Abstract