MMGR: A Multi-modal Contrastive Learning method for Multiple Geographic Mapping Tasks with VHR Images and POIs
Topics:
Keywords: Multi-modal Contrastive Learning, VHR images, Points-of-Interest, Urban Function, Population Density, Gross Domestic Products, Model Pretraining
Abstract Type: Paper Abstract
Authors:
Lubin Bai Institute of Remote Sensing and GIS, Peking University, Beijing
Xiuyuan Zhang Institute of Remote Sensing and GIS, Peking University, Beijing
Shihong Du Institute of Remote Sensing and GIS, Peking University, Beijing, China
Abstract
Very-High-Resolution (VHR) images contain a wealth of physical features of geographies but can hardly represent their socio-economic properties, limiting their applicability to social/human-related geographic studies. Additionally, most supervised geographic mapping methods with VHR images are designed for a specific task, leading to high label-dependency and inadequate task-generality. To resolve these two issues, we propose a Multi-Modal contrastive learning method to learn Geographic Representations (MMGR) carrying both physical and socio-economic properties by interacting VHR images and POIs, which can be used in various geographic mapping tasks. In MMGR, two contrastive learning modules are designed. One is the intra-modal contrastive learning module which mines visual features by contrasting different VHR image augmentations, and the other is the inter-modal one, integrating physical and socio-economic features by contrasting VHR images and POI features. After multi-modal training, the learned representations can be readily applied to multiple geographic mapping tasks through training a simple classifier/regressor using a small number of labeled samples. Three relevant while distinctive geographic mapping tasks (i.e., mapping urban functional distributions, population density, and gross domestic product) are considered as empirical evidences to verify the superiority MMGR. And the results illustrate that MMGR considerably outperforms seven competitive baselines in all three tasks, indicating its effectiveness in fusing VHR images and POIs for multiple geographic mapping tasks. Moreover, MMGR is a competent pre-training method to help image encoders understand multi-modal geographic information, which can be further strengthened by fine-tuning with a few labeled samples.
MMGR: A Multi-modal Contrastive Learning method for Multiple Geographic Mapping Tasks with VHR Images and POIs
Category
Paper Abstract
Description
Submitted By:
Lubin Bai
lbbai@stu.pku.edu.cn
This abstract is part of a session: SpaceTimeAI for Multi-Model Data Fusion