tabulapdf/tabula

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/tabulapdf/tabula)

tabulapdf / tabula

Tabula is a tool for liberating data tables trapped inside PDF files

☆7,450

Alternatives and similar repositories for tabula

Users that are interested in tabula are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

tabulapdf / tabula-java
View on GitHub
Extract tables from PDF files
☆2,036Mar 19, 2025Updated last year
chezou / tabula-py
View on GitHub
Simple wrapper of tabula-java: extract table from PDF into pandas DataFrame
☆2,315Dec 5, 2024Updated last year
tabulapdf / tabula-extractor
View on GitHub
Extract tables from PDF files
☆358May 17, 2016Updated 10 years ago
camelot-dev / excalibur
View on GitHub
A web interface to extract tabular data from PDFs
☆1,811May 20, 2026Updated 2 months ago
atlanhq / camelot
View on GitHub
Camelot: PDF Table Extraction for Humans
☆3,716Jan 5, 2023Updated 3 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
camelot-dev / camelot
View on GitHub
A Python library to extract tabular data from PDFs
☆3,792Updated this week
OpenRefine / OpenRefine
View on GitHub
OpenRefine is a free, open source power tool for working with messy data and improving it
☆11,931Updated this week
jsvine / pdfplumber
View on GitHub
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
☆10,602Jul 20, 2026Updated last week
ashima / pdf-table-extract
View on GitHub
Extract tables from PDF pages.
☆300Jun 25, 2020Updated 6 years ago
euske / pdfminer
View on GitHub
Python PDF Parser (Not actively maintained). Check out pdfminer.six.
☆5,279Dec 7, 2022Updated 3 years ago
WZBSocialScienceCenter / pdftabextract
View on GitHub
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
☆2,255Jun 24, 2022Updated 4 years ago
pdfminer / pdfminer.six
View on GitHub
Community maintained fork of pdfminer - we fathom PDF
☆7,012Mar 13, 2026Updated 4 months ago
wireservice / csvkit
View on GitHub
A suite of utilities for converting to and working with CSV, the king of tabular file formats.
☆6,404Jul 23, 2026Updated last week
simonw / datasette
View on GitHub
An open source multi-tool for exploring and publishing data
☆11,323Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
rawgraphs / rawgraphs-app
View on GitHub
A web interface to create custom vector-based visualizations on top of RAWGraphs core
☆9,018Nov 18, 2025Updated 8 months ago
JonathanLink / PDFLayoutTextStripper
View on GitHub
Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf fi…
☆1,608Dec 17, 2023Updated 2 years ago
tesseract-ocr / tesseract
View on GitHub
Tesseract Open Source OCR Engine (main repository)
☆75,640Updated this week
deanmalmgren / textract
View on GitHub
extract text from any document. no muss. no fuss.
☆4,679Jul 11, 2026Updated 2 weeks ago
apache / superset
View on GitHub
Apache Superset is a Data Visualization and Data Exploration Platform
☆74,060Updated this week
jcushman / pdfquery
View on GitHub
A fast and friendly PDF scraping library.
☆781Oct 17, 2023Updated 2 years ago
explosion / spaCy
View on GitHub
💫 Industrial-strength Natural Language Processing (NLP) in Python
☆33,785May 19, 2026Updated 2 months ago
huginn / huginn
View on GitHub
Create agents that monitor and act on your behalf. Your agents are standing by!
☆49,713Updated this week
ropensci / tabulapdf
View on GitHub
Bindings for Tabula PDF Table Extractor Library
☆565Jan 3, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
metabase / metabase
View on GitHub
The easy-to-use open source Business Intelligence and Embedded Analytics tool that lets everyone work with data
☆48,453Updated this week
py-pdf / pypdf
View on GitHub
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
☆10,136Updated this week
Quartz / bad-data-guide
View on GitHub
An exhaustive reference to problems seen in real-world data along with suggestions on how to resolve them.
☆4,127Sep 20, 2021Updated 4 years ago
facebookresearch / fastText
View on GitHub
Library for fast text representation and classification.
☆26,549Mar 22, 2024Updated 2 years ago
saulpw / visidata
View on GitHub
A terminal spreadsheet multitool for discovering and arranging data
☆9,209Jul 15, 2026Updated 2 weeks ago
vega / altair
View on GitHub
Declarative visualization library for Python
☆10,441Updated this week
antonycourtney / tad
View on GitHub
A desktop application for viewing and analyzing tabular data
☆3,473Mar 5, 2025Updated last year
getredash / redash
View on GitHub
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
☆28,720Updated this week
vega / vega
View on GitHub
A visualization grammar.
☆11,943Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
rclone / rclone
View on GitHub
"rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Azure Blob, A…
☆58,825Updated this week
spotify / luigi
View on GitHub
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, vis…
☆18,755Jul 18, 2026Updated last week
johnkerl / miller
View on GitHub
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
☆9,973Updated this week
ocropus-archive / DUP-ocropy
View on GitHub
Python-based tools for document analysis and OCR
☆3,466May 22, 2021Updated 5 years ago
BurntSushi / xsv
View on GitHub
A fast CSV command line toolkit written in Rust.
☆10,757Apr 24, 2025Updated last year
ocrmypdf / OCRmyPDF
View on GitHub
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
☆34,319Updated this week
dat-ecosystem / dat
View on GitHub
peer-to-peer sharing & live syncronization of files via command line
☆8,231May 7, 2023Updated 3 years ago