xavctn/img2table

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/xavctn/img2table)

xavctn / img2table

img2table is a table identification and extraction Python Library for PDF and images, based on OpenCV image processing

☆877

Alternatives and similar repositories for img2table

Users that are interested in img2table are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

conjuncts / gmft
View on GitHub
Lightweight, performant, deep table extraction
☆537Feb 22, 2026Updated 4 months ago
microsoft / table-transformer
View on GitHub
Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the o…
☆2,927Jun 24, 2024Updated 2 years ago
eihli / image-table-ocr
View on GitHub
Turn images of tables into CSV data. Detect tables from images and run OCR on the cells.
☆517Mar 3, 2021Updated 5 years ago
Tan-Junwen / awesome-table-structure-recognition
View on GitHub
A Curated List of Awesome Table Structure Recognition (TSR) Research. Including models, papers, datasets and codes. Continuously updating…
☆232Sep 9, 2024Updated last year
poloclub / unitable
View on GitHub
UniTable: Towards a Unified Table Foundation Model
☆532Apr 21, 2026Updated 2 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
deepdoctection / deepdoctection
View on GitHub
A Repo For Document AI
☆3,186Jun 20, 2026Updated 2 weeks ago
datalab-to / surya
View on GitHub
OCR, layout analysis, reading order, table recognition in 90+ languages
☆21,058Updated this week
mindee / doctr
View on GitHub
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
☆6,169Updated this week
Layout-Parser / layout-parser
View on GitHub
A Unified Toolkit for Deep Learning Based Document Image Analysis
☆5,755Aug 15, 2024Updated last year
DevashishPrasad / CascadeTabNet
View on GitHub
This repository contains the code and implementation details of the CascadeTabNet paper "CascadeTabNet: An approach for end to end table …
☆1,548Aug 27, 2021Updated 4 years ago
jsvine / pdfplumber
View on GitHub
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
☆10,513Jun 17, 2026Updated 3 weeks ago
cv-small-snails / Awesome-Table-Recognition
View on GitHub
A curated list of resources dedicated to table recognition
☆404Dec 12, 2024Updated last year
LidorPrototype / TableNetTable2df
View on GitHub
https://betterprogramming.pub/table-detection-and-extraction-tablenet-deep-learning-model-with-pytorch-from-images-64489e92b641
☆15Jul 5, 2023Updated 3 years ago
camelot-dev / camelot
View on GitHub
A Python library to extract tabular data from PDFs
☆3,773Updated this week
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Filimoa / open-parse
View on GitHub
Improved file parsing for LLM’s
☆3,162May 17, 2026Updated last month
AlibabaResearch / AdvancedLiterateMachinery
View on GitHub
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team…
☆1,831Mar 17, 2026Updated 3 months ago
tiwaridipak103 / Table_extraction
View on GitHub
☆22Jun 22, 2026Updated 2 weeks ago
pymupdf / PyMuPDF
View on GitHub
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
☆10,152Jul 2, 2026Updated last week
Unstructured-IO / unstructured
View on GitHub
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean…
☆15,066Jun 24, 2026Updated 2 weeks ago
datalab-to / pdftext
View on GitHub
Extract structured text from pdfs quickly
☆704Updated this week
InternScience / StructEqTable-Deploy
View on GitHub
A High-efficiency Open-source Toolkit for Table-to-Latex Task
☆275Dec 6, 2025Updated 7 months ago
clovaai / donut
View on GitHub
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
☆6,897Jul 11, 2024Updated last year
ExtractTable / ExtractTable-py
View on GitHub
Python library to extract tabular data from images and scanned PDFs
☆285Jul 30, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Alimustoofaa / EasyOCRLabel
View on GitHub
☆11Nov 16, 2023Updated 2 years ago
tstanislawek / awesome-document-understanding
View on GitHub
A curated list of resources for Document Understanding (DU) topic
☆1,523Jun 2, 2023Updated 3 years ago
nlmatics / llmsherpa
View on GitHub
Developer APIs to Accelerate LLM Projects
☆1,745Oct 18, 2024Updated last year
FutureRising007 / Table_Structure_Recognition
View on GitHub
Table Structure Recognition
☆83Mar 11, 2023Updated 3 years ago
PaddlePaddle / PaddleOCR
View on GitHub
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/…
☆84,648Jun 26, 2026Updated last week
datalab-to / marker
View on GitHub
Convert PDF to markdown + JSON quickly with high accuracy
☆37,195Updated this week
facebookresearch / nougat
View on GitHub
Implementation of Nougat Neural Optical Understanding for Academic Documents
☆10,029Feb 21, 2025Updated last year
tomassosorio / OCR_tablenet
View on GitHub
TableNet Implementation on Pytorch
☆150Dec 9, 2022Updated 3 years ago
opendatalab / PDF-Extract-Kit
View on GitHub
A Comprehensive Toolkit for High-Quality PDF Content Extraction
☆9,771Jan 3, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
run-llama / llama_cloud_services
View on GitHub
Knowledge Agents and Management in the Cloud
☆4,251May 18, 2026Updated last month
DIGI-VUB / text.alignment
View on GitHub
Text Alignment with Smith-Waterman
☆11Nov 26, 2025Updated 7 months ago
Ucas-HaoranWei / GOT-OCR2.0
View on GitHub
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
☆8,148Feb 10, 2025Updated last year
Psarpei / Multi-Type-TD-TSR
View on GitHub
Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition
☆289Sep 5, 2022Updated 3 years ago
Tada-AI / pdf_parser
View on GitHub
Good enough PDF parser for CPU
☆15Aug 9, 2024Updated last year
ayanban011 / SwinDocSegmenter
View on GitHub
[ICDAR 2023] (Oral) An End-to-End Unified Domain Adaptive Transformer for Document Instance Segmentation
☆74Sep 12, 2024Updated last year
jiangnanboy / table_structure_recognition
View on GitHub
利用Swin-Unet(Swin Transformer Unet)实现对文档图片里表格结构的识别，Swin-unet (Swin Transformer Unet) is used to identify the document table structure
☆27Feb 23, 2024Updated 2 years ago