i-manager Publications

Benchmarking Open-Source OCR Engines on Semantic Slide Regions in Educational Videos using a Subset of FITVID Dataset

Purushotham E.*, Kasarapu Ramani**, Shoba Bindu C.***

* Department of Computer Science and Engineering, Jawaharlal Nehru Technological University Kakinada (JNTUK), Kakinada, Andhra Pradesh, India.

** Department of Computer Science and Engineering, GITAM School of Technology, GITAM Deemed University, Bengaluru, Karnataka, India.

*** Department of Computer Science and Engineering, Jawaharlal Nehru Technological University College of Engineering, Ananthapuramu, Andhra Pradesh, India.

Periodicity:July - September'2025
DOI : https://doi.org/10.26634/jfet.20.4.22287

Abstract

Optical Character Recognition (OCR) has a significant application in obtaining text from academic video material, particularly from lecture slides. Still, most of the available OCR assessments address documents holistically and do not consider structural and semantic variance contained in slide content. This paper comprehensively benchmarks five open-source OCR engines—Tesseract, EasyOCR, PaddleOCR, Keras-OCR, and DocTR—on labeled semantic regions of lecture slides, including titles, text boxes, tables, handwritten notes, headers, and footers. Because of architecture and runtime constraints, DocTR and Keras-OCR were excluded from the final performance comparison. The study examines OCR engine performance over these region categories using Word Error Rate (WER) and Character Error Rate (CER) as metrics. Findings indicate no one OCR engine stands out across categories: Tesseract works consistently on formatted text areas such as titles and headings, while PaddleOCR is best at identifying handwritten and tabular data. The results emphasize the necessity of region-aware OCR selection in applications for indexing lecture videos. This contribution offers a pragmatic benchmark and actionable recommendations for researchers and engineers constructing searchable educational content platforms.

Keywords

OCR, Semantic Regions, Educational Videos, WER, CER.

How to Cite this Article?

Purushotham, E., Ramani, K., and Bindu, C. S. (2025). Benchmarking Open-Source OCR Engines on Semantic Slide Regions in Educational Videos using a Subset of FITVID Dataset. i-manager’s Journal on Future Engineering & Technology, 20(4), 9-17. https://doi.org/10.26634/jfet.20.4.22287

	North Americas,UK, Middle East,Europe		India	Rest of world
	USD	EUR	INR	USD-ROW
Pdf	35	35	200	20
Online	15	15	200	15
Pdf & Online	35	35	400	25

Benchmarking Open-Source OCR Engines on Semantic Slide Regions in Educational Videos using a Subset of FITVID Dataset

Abstract

Keywords

How to Cite this Article?

References

If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Options for accessing this content: