NetCDFaster: A Geospatial Cyberinfrastructure Enhancing Multi-Dimensional Scientific Datasets Access and Visualization Through Machine Learning Optimization
Topics:
Keywords:
Abstract Type: Paper Abstract
Authors:
Zhenlei Song,
,
,
,
,
,
,
,
,
,
Abstract
NetCDF is a data standard in the geosciences for storing multidimensional data, which has been widely used to facilitate the storage, sharing, and analysis of complex datasets in climatology, oceanography, and meteorology. Many previous solutions to optimize NetCDF data access compromise certain performance metrics to enhance others or lack support for the full workflow of sharing, reading, and visualization. In this article, we propose NetCDFaster, a novel solution designed to improve the timeliness and the accessibility in locating and extracting multidimensional subsets of NetCDF data. The system employs a CatBoost classifier to recommend the optimal subset-querying interface and parameter setting based on file and variable features, and user input. It provides a web-app UI for uploading NetCDF files and quickly extracting meta information like attributes, coordinates, and variables. Experimental results show NetCDFaster can achieve an F1-score of 64% in selecting the optimal set of interface parameters. In terms of timeliness performance, a 5%-10% improvement is achieved in 90% of cases, with the best results reaching up to 80\% optimization in time consumption. This approach enhances geospatial analysis and visualization by improving the speed and accuracy of multi-dimensional data range indexing.
NetCDFaster: A Geospatial Cyberinfrastructure Enhancing Multi-Dimensional Scientific Datasets Access and Visualization Through Machine Learning Optimization
Category
Paper Abstract
Description
Submitted by:
Zhenlei Song Texas A&M University
songzl@tamu.edu
This abstract is part of a session. Click here to view the session.
| Slides