ICDAR 2019 Historical Document Reading Challenge on Large Structured Chinese Family Records

The objective of ICDAR 2019 HDRC Chinese is to recognize and analyze the layout, and finally detect and recognize the textlines and characters of the large historical document collection containing more than 20.000 pages kindly provided by FamilySearch.

FamilySearch-DB is a collection of Chinese manuscripts that have been chosen regarding the complexity of their layout in semantic structure and font. All manuscripts are annotated using Aletheia, an advanced system for accurate and yet cost-effective ground truthing of large amounts of documents. The annotation of the manuscripts are available in PAGE XML format, a sophisticated XML schema which is component of the PAGE (Page Analysis and Ground truth Elements) Format Framework.

We propose 3 different tasks for this competition:
Task 1: Handwritten Character Recognition on extracted textlines
Task 2: Layout Analysis on structured historical document images
Task 3: Complete, integrated textline detection and recognition on a large dataset


Foteini Simistira Liwicki 
Rajkumar Saini
Marcus Liwicki