Using recent advances in deep learning and knowledge management we will develop a tool to manage and analyze about 220,000 pages of digital images of seventeenth-century manuscripts available at the Archivo General de la República Argentina (National Archives) located in Buenos Aires. This software will enable twenty-first century scholars to expeditiously read and analyze seventeenth century Spanish American notary records and efficiently find relevant content in these documentary collections.
The documents contained in this depository combine a large variety of handwritten scripts.
Scans contain different types of noise including discoloration, stains, as well as ink bleeds and smudges.
Written in cursive, historical scripts usually employ irregular characters and capitalization, abbreviations, archaic spelling, and linked words.
Preprocessing techniques are applied to clean the images without affecting the written content.
Paleography experts actively engage in the process of information extraction to obtain accurate information from the images.
Optical character recognition (ORC) is used to automatically convert printed or handwritten text into machine-readable, editable, and searchable text. In order to enable OCR tasks, researchers apply different methods. In recent years, deep learning has achieved remarkable success for image understanding and classification, image segmentation, speech recognition, and natural language processing.
Acknowledgments
We thank the National Endowment for Humanities (NEH,
Grant No. HAA-271747-20 and
Grant No. HAA-287903-22). Missouri Institute for Defense and Energy (UMKC MIDE),
UMKC Funding for Excellence Program, UMKC/IDEAS Collaborative Data Science Grant, and the
University of Missouri System Tier 3 Strategic Investment Grant for supporting this project
This is an ongoing collaboration between University of Missouri-Kansas City, the University of Missouri-Columbia, and the National Archives of Argentina.
Flarsheim Hall, Room 251
5110 Rockhill Road, Kansas City, MO 64110
© 2023 Missouri Institute for Defense & Energy