Benchmarking Open-Source OCR Engines on Semantic Slide Regions in Educational Videos using a Subset of FITVID Dataset

Purushotham E.*, Kasarapu Ramani**, Shoba Bindu C.***
* Department of Computer Science and Engineering, Jawaharlal Nehru Technological University Kakinada (JNTUK), Kakinada, Andhra Pradesh, India.
** Department of Computer Science and Engineering, GITAM School of Technology, GITAM Deemed University, Bengaluru, Karnataka, India.
*** Department of Computer Science and Engineering, Jawaharlal Nehru Technological University College of Engineering, Ananthapuramu, Andhra Pradesh, India.
Periodicity:July - September'2025

Abstract

Optical Character Recognition (OCR) has a significant application in obtaining text from academic video material, particularly from lecture slides. Still, most of the available OCR assessments address documents holistically and do not consider structural and semantic variance contained in slide content. This paper comprehensively benchmarks five open-source OCR engines—Tesseract, EasyOCR, PaddleOCR, Keras-OCR, and DocTR—on labeled semantic regions of lecture slides, including titles, text boxes, tables, handwritten notes, headers, and footers. Because of architecture and runtime constraints, DocTR and Keras-OCR were excluded from the final performance comparison. The study examines OCR engine performance over these region categories using Word Error Rate (WER) and Character Error Rate (CER) as metrics. Findings indicate no one OCR engine stands out across categories: Tesseract works consistently on formatted text areas such as titles and headings, while PaddleOCR is best at identifying handwritten and tabular data. The results emphasize the necessity of region-aware OCR selection in applications for indexing lecture videos. This contribution offers a pragmatic benchmark and actionable recommendations for researchers and engineers constructing searchable educational content platforms.

Keywords

OCR, Semantic Regions, Educational Videos, WER, CER.

How to Cite this Article?

Purushotham, E., Ramani, K., and Bindu, C. S. (2025). Benchmarking Open-Source OCR Engines on Semantic Slide Regions in Educational Videos using a Subset of FITVID Dataset. i-manager’s Journal on Future Engineering & Technology, 20(4), 9-17.

References

If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Pdf 35 35 200 20
Online 15 15 200 15
Pdf & Online 35 35 400 25

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.