camelot-dev/camelot

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/camelot-dev/camelot)

camelot-dev / camelot

A Python library to extract tabular data from PDFs

☆3,785

Alternatives and similar repositories for camelot

Users that are interested in camelot are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

camelot-dev / excalibur
View on GitHub
A web interface to extract tabular data from PDFs
☆1,810May 20, 2026Updated last month
jsvine / pdfplumber
View on GitHub
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
☆10,546Jun 17, 2026Updated 3 weeks ago
chezou / tabula-py
View on GitHub
Simple wrapper of tabula-java: extract table from PDF into pandas DataFrame
☆2,315Dec 5, 2024Updated last year
atlanhq / camelot
View on GitHub
Camelot: PDF Table Extraction for Humans
☆3,716Jan 5, 2023Updated 3 years ago
pdfminer / pdfminer.six
View on GitHub
Community maintained fork of pdfminer - we fathom PDF
☆7,001Mar 13, 2026Updated 4 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
tabulapdf / tabula
View on GitHub
Tabula is a tool for liberating data tables trapped inside PDF files
☆7,440Mar 14, 2025Updated last year
pymupdf / PyMuPDF
View on GitHub
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
☆10,218Updated this week
py-pdf / pypdf
View on GitHub
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
☆10,115Jun 30, 2026Updated 2 weeks ago
py-pdf / pypdf_table_extraction
View on GitHub
A Python library to extract tabular data from PDFs
☆66Apr 8, 2025Updated last year
microsoft / table-transformer
View on GitHub
Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the o…
☆2,930Jun 24, 2024Updated 2 years ago
Layout-Parser / layout-parser
View on GitHub
A Unified Toolkit for Deep Learning Based Document Image Analysis
☆5,760Aug 15, 2024Updated last year
Unstructured-IO / unstructured
View on GitHub
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean…
☆15,132Updated this week
datalab-to / marker
View on GitHub
Convert PDF to markdown + JSON quickly with high accuracy
☆37,482Jul 7, 2026Updated last week
datalab-to / surya
View on GitHub
OCR, layout analysis, reading order, table recognition in 90+ languages
☆21,093Updated this week
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
tabulapdf / tabula-java
View on GitHub
Extract tables from PDF files
☆2,034Mar 19, 2025Updated last year
pikepdf / pikepdf
View on GitHub
A Python library for reading and writing PDF, powered by QPDF
☆2,761Updated this week
streamlit / streamlit
View on GitHub
Streamlit — A faster way to build and share data apps.
☆45,230Updated this week
DevashishPrasad / CascadeTabNet
View on GitHub
This repository contains the code and implementation details of the CascadeTabNet paper "CascadeTabNet: An approach for end to end table …
☆1,549Aug 27, 2021Updated 4 years ago
mindee / doctr
View on GitHub
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
☆6,181Updated this week
doc-analysis / TableBank
View on GitHub
TableBank: A Benchmark Dataset for Table Detection and Recognition
☆1,080Aug 12, 2024Updated last year
run-llama / llama_index
View on GitHub
LlamaIndex is the leading document agent and OCR platform
☆50,842Updated this week
deepset-ai / haystack
View on GitHub
Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and a…
☆25,896Updated this week
opendatalab / PDF-Extract-Kit
View on GitHub
A Comprehensive Toolkit for High-Quality PDF Content Extraction
☆9,785Jan 3, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
WZBSocialScienceCenter / pdftabextract
View on GitHub
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
☆2,253Jun 24, 2022Updated 4 years ago
deepdoctection / deepdoctection
View on GitHub
A Repo For Document AI
☆3,187Jun 20, 2026Updated 3 weeks ago
euske / pdfminer
View on GitHub
Python PDF Parser (Not actively maintained). Check out pdfminer.six.
☆5,283Dec 7, 2022Updated 3 years ago
explosion / spaCy
View on GitHub
💫 Industrial-strength Natural Language Processing (NLP) in Python
☆33,742May 19, 2026Updated last month
PaddlePaddle / PaddleOCR
View on GitHub
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/…
☆85,472Updated this week
stanfordnlp / dspy
View on GitHub
DSPy: The framework for programming—not prompting—language models
☆36,125Updated this week
deanmalmgren / textract
View on GitHub
extract text from any document. no muss. no fuss.
☆4,662Updated this week
facebookresearch / nougat
View on GitHub
Implementation of Nougat Neural Optical Understanding for Academic Documents
☆10,046Feb 21, 2025Updated last year
axa-group / Parsr
View on GitHub
Transforms PDF, Documents and Images into Enriched Structured Data
☆6,178Mar 20, 2026Updated 3 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
huggingface / sentence-transformers
View on GitHub
State-of-the-Art Embeddings, Retrieval, and Reranking
☆18,909Updated this week
HumanSignal / label-studio
View on GitHub
Label Studio is a multi-type data labeling and annotation tool with standardized output format
☆27,837Updated this week
JaidedAI / EasyOCR
View on GitHub
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and …
☆29,752Dec 5, 2025Updated 7 months ago
drj11 / pdftables
View on GitHub
A library for extracting tables from PDF files
☆93Aug 2, 2020Updated 5 years ago
microsoft / unilm
View on GitHub
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
☆22,158Jan 23, 2026Updated 5 months ago
ocrmypdf / OCRmyPDF
View on GitHub
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
☆34,180Jul 3, 2026Updated last week
grobidOrg / grobid
View on GitHub
A machine learning software for extracting information from scholarly documents
☆4,992Updated this week