Scholarly open access journals, Peer-reviewed, and Refereed Journals, Impact factor 8.14 (Calculate by google scholar and Semantic Scholar | AI-Powered Research Tool) , Multidisciplinary, Monthly, Indexing in all major database & Metadata, Citation Generator, Digital Object Identifier(DOI)
As data-driven decision-making is becoming more prevalent, extracting tabular data from scanned documents and images is a big issue. In this paper, we present an automated table extraction pipeline that employs both OpenCV for image pre-processing and Tesseract OCR for text extraction. The system applies grayscale, binarization, and morphological processing to detect lines and isolate text, thereby facilitating correct tabular data extraction. The extracted data is later transformed into a significant table structure utilizing Python's pandas package and eventually saved as an Excel file. The method suggested is effective for those documents that possess clearly defined tabular structures and acts as a stepping stone for more complex document analysis systems.
Index Terms— OpenCV, Tesseract OCR, Table Extraction, Python, Image Pr ocessing, Document Analysis.
_______________________________________________________________________________________________
"A Computer Vision Powered OCR Framework for Extracting Tabular Data from Scanned PDFs", International Journal of Science & Engineering Development Research (www.ijrti.org), ISSN:2455-2631, Vol.10, Issue 4, page no.d126-d130, April-2025, Available :http://www.ijrti.org/papers/IJRTI2504315.pdf
Downloads:
000416
ISSN:
2456-3315 | IMPACT FACTOR: 8.14 Calculated By Google Scholar| ESTD YEAR: 2016
An International Scholarly Open Access Journal, Peer-Reviewed, Refereed Journal Impact Factor 8.14 Calculate by Google Scholar and Semantic Scholar | AI-Powered Research Tool, Multidisciplinary, Monthly, Multilanguage Journal Indexing in All Major Database & Metadata, Citation Generator