FIZ Karlsruhe
Named Entity Recognition Evaluation in OCR Workflows
Pages
6
Time to read
16 mins
Publication
Language
English
Pages
6
Time to read
16 mins
Publication
Language
English
This research article evaluates the effectiveness of two Named Entity Recognition (NER) tools in extracting entities from Optical Character Recognition (OCR) outputs. The authors developed a test dataset that includes both raw and corrected OCR outputs, which were manually annotated with tags for Person, Location, and Organisation. The study applies each NER tool to these outputs and assesses their performance by comparing precision, recall, and F1 scores against the manually annotated data. The paper discusses the challenges posed by OCR-generated text, which often contains errors that impact NER accuracy. It also presents two openly available datasets from the Wiedergutmachung project, which aims to contextualize historical knowledge related to National Socialist injustices. The findings contribute to understanding the interplay between OCR noise and NER performance, providing insights into the efficacy of different NER models when applied to noisy textual data. The article concludes with discussions on future directions for integrating OCR and NER techniques.