camelot-dev/excalibur

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/camelot-dev/excalibur)

camelot-dev / excalibur

A web interface to extract tabular data from PDFs

☆1,810

Alternatives and similar repositories for excalibur

Users that are interested in excalibur are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

camelot-dev / camelot
View on GitHub
A Python library to extract tabular data from PDFs
☆3,786Updated this week
atlanhq / camelot
View on GitHub
Camelot: PDF Table Extraction for Humans
☆3,716Jan 5, 2023Updated 3 years ago
tabulapdf / tabula
View on GitHub
Tabula is a tool for liberating data tables trapped inside PDF files
☆7,446Mar 14, 2025Updated last year
chezou / tabula-py
View on GitHub
Simple wrapper of tabula-java: extract table from PDF into pandas DataFrame
☆2,315Dec 5, 2024Updated last year
jsvine / pdfplumber
View on GitHub
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
☆10,575Updated this week
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
tabulapdf / tabula-java
View on GitHub
Extract tables from PDF files
☆2,035Mar 19, 2025Updated last year
pdfminer / pdfminer.six
View on GitHub
Community maintained fork of pdfminer - we fathom PDF
☆7,002Mar 13, 2026Updated 4 months ago
WZBSocialScienceCenter / pdftabextract
View on GitHub
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
☆2,255Jun 24, 2022Updated 4 years ago
ashima / pdf-table-extract
View on GitHub
Extract tables from PDF pages.
☆300Jun 25, 2020Updated 6 years ago
py-pdf / pypdf
View on GitHub
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
☆10,121Jun 30, 2026Updated 3 weeks ago
euske / pdfminer
View on GitHub
Python PDF Parser (Not actively maintained). Check out pdfminer.six.
☆5,283Dec 7, 2022Updated 3 years ago
pymupdf / PyMuPDF
View on GitHub
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
☆10,283Updated this week
Layout-Parser / layout-parser
View on GitHub
A Unified Toolkit for Deep Learning Based Document Image Analysis
☆5,764Aug 15, 2024Updated last year
doc-analysis / TableBank
View on GitHub
TableBank: A Benchmark Dataset for Table Detection and Recognition
☆1,080Aug 12, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
microsoft / table-transformer
View on GitHub
Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the o…
☆2,930Jun 24, 2024Updated 2 years ago
jcushman / pdfquery
View on GitHub
A fast and friendly PDF scraping library.
☆781Oct 17, 2023Updated 2 years ago
deanmalmgren / textract
View on GitHub
extract text from any document. no muss. no fuss.
☆4,670Jul 11, 2026Updated last week
axa-group / Parsr
View on GitHub
Transforms PDF, Documents and Images into Enriched Structured Data
☆6,177Mar 20, 2026Updated 4 months ago
simonw / datasette
View on GitHub
An open source multi-tool for exploring and publishing data
☆11,296Jul 14, 2026Updated last week
ocrmypdf / OCRmyPDF
View on GitHub
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
☆34,240Updated this week
explosion / spaCy
View on GitHub
💫 Industrial-strength Natural Language Processing (NLP) in Python
☆33,757May 19, 2026Updated 2 months ago
deepset-ai / haystack
View on GitHub
Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and a…
☆25,969Updated this week
the-black-knight-01 / Tabulo
View on GitHub
Table Detection and Extraction Using Deep Learning ( It is built in Python, using Luminoth, TensorFlow<2.0 and Sonnet.)
☆197Nov 24, 2022Updated 3 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
borb-pdf / borb
View on GitHub
borb is a library for reading, creating and manipulating PDF files in python.
☆3,564Updated this week
grobidOrg / grobid
View on GitHub
A machine learning software for extracting information from scholarly documents
☆5,010Updated this week
streamlit / streamlit
View on GitHub
Streamlit — A faster way to build and share data apps.
☆45,306Updated this week
JaidedAI / EasyOCR
View on GitHub
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and …
☆29,800Dec 5, 2025Updated 7 months ago
jstockwin / py-pdf-parser
View on GitHub
A Python tool to help extracting information from structured PDFs.
☆425Jul 13, 2026Updated last week
invoice-x / invoice2data
View on GitHub
Extract structured data from PDF invoices
☆2,178Updated this week
pikepdf / pikepdf
View on GitHub
A Python library for reading and writing PDF, powered by QPDF
☆2,766Updated this week
deepdoctection / deepdoctection
View on GitHub
A Repo For Document AI
☆3,192Jun 20, 2026Updated last month
DevashishPrasad / CascadeTabNet
View on GitHub
This repository contains the code and implementation details of the CascadeTabNet paper "CascadeTabNet: An approach for end to end table …
☆1,549Aug 27, 2021Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
vega / altair
View on GitHub
Declarative visualization library for Python
☆10,434Updated this week
vaexio / vaex
View on GitHub
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per s…
☆8,509Apr 1, 2026Updated 3 months ago
Unstructured-IO / unstructured
View on GitHub
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean…
☆15,176Updated this week
voila-dashboards / voila
View on GitHub
Voilà turns Jupyter notebooks into standalone web applications
☆5,934Jul 6, 2026Updated 2 weeks ago
ibm-aur-nlp / PubTabNet
View on GitHub
☆483Jul 8, 2025Updated last year
HumanSignal / label-studio
View on GitHub
Label Studio is a multi-type data labeling and annotation tool with standardized output format
☆27,896Updated this week
tesseract-ocr / tesseract
View on GitHub
Tesseract Open Source OCR Engine (main repository)
☆75,484Updated this week