SynMap: A Synthetic Dataset for Text Spotting in Scanned Historical Maps
Topics:
Keywords: Historical Maps,Text Spotting
Abstract Type: Paper Abstract
Authors:
Yijun Lin, University of Minnesota
Jina Kim, University of Minnesota
Zekun Li, University of Minnesota
Yao-Yi Chiang, Yao-Yi Chiang
,
,
,
,
,
,
Abstract
Text labels in historical map scans provide valuable information for understanding human activities and their changes over time. Automatic text localization and recognition (i.e., text spotting) on maps facilitate generating useful map metadata and analyzing map content at scale. However, text spotting on scanned historical maps is challenging. First, most existing text spotting methods use out-of-domain datasets (e.g., scene images) for training where texts are typically horizontal and compact. Nevertheless, text labels on historical maps have various rotation angles and letter spacings, causing poor performance in these spotting models. Second, training a machine learning model needs many samples (i.e., text annotations), while annotating text labels in maps requires extensive manual work. This paper proposes a method to automatically generate unlimited historical-styled map images with text annotations (named as SynMap) for training text spotters, aiming to boost their performance on historical maps. The process of generating SynMap includes 1) using QGIS to place the location names from OpenStreetMap on an image curved along the corresponding geometry (e.g., road lines) with various styles (e.g., font, letter spacing); 2) automatically retrieving word-level bounding polygons and text labels as annotations; 3) creating varying background styles by exploiting and clustering large varieties of map backgrounds in the David Rumsey collection; 4) merging synthetic text labels and backgrounds. We show that the state-of-the-art text spotting models (e.g., TESTR) trained with SynMap can significantly improve the performance of text detection and recognition on historical maps.
SynMap: A Synthetic Dataset for Text Spotting in Scanned Historical Maps
Category
Paper Abstract