jstockwin/py-pdf-parser

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/jstockwin/py-pdf-parser)

jstockwin / py-pdf-parser

A Python tool to help extracting information from structured PDFs.

☆425

Alternatives and similar repositories for py-pdf-parser

Users that are interested in py-pdf-parser are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

pdfminer / pdfminer.six
View on GitHub
Community maintained fork of pdfminer - we fathom PDF
☆7,002Mar 13, 2026Updated 4 months ago
jsvine / pdfplumber
View on GitHub
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
☆10,575Updated this week
maxpmaxp / pdfreader
View on GitHub
Python API for PDF documents
☆124Sep 5, 2024Updated last year
pymupdf / PyMuPDF-Utilities
View on GitHub
Demos, examples and utilities using PyMuPDF
☆723Jan 8, 2026Updated 6 months ago
py-pdf / pypdf
View on GitHub
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
☆10,121Jun 30, 2026Updated 3 weeks ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
chezou / tabula-py
View on GitHub
Simple wrapper of tabula-java: extract table from PDF into pandas DataFrame
☆2,315Dec 5, 2024Updated last year
pymupdf / PyMuPDF
View on GitHub
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
☆10,283Updated this week
pypdfium2-team / pypdfium2
View on GitHub
Python bindings to PDFium, reasonably cross-platform.
☆799Updated this week
camelot-dev / camelot
View on GitHub
A Python library to extract tabular data from PDFs
☆3,786Updated this week
pikepdf / pikepdf
View on GitHub
A Python library for reading and writing PDF, powered by QPDF
☆2,766Updated this week
borb-pdf / borb
View on GitHub
borb is a library for reading, creating and manipulating PDF files in python.
☆3,564Updated this week
pmaupin / pdfrw
View on GitHub
pdfrw is a pure Python library that reads and writes PDFs
☆1,908Apr 29, 2024Updated 2 years ago
jcushman / pdfquery
View on GitHub
A fast and friendly PDF scraping library.
☆781Oct 17, 2023Updated 2 years ago
Chronasorg / chronas-api
View on GitHub
This API provides authentication and CRUD operations for data used by the Chronas application
☆14Jul 12, 2026Updated last week
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
openlegaldata / legal-ner
View on GitHub
Named entity recognition for the legal domain
☆43Jun 1, 2021Updated 5 years ago
jalan / pdftotext
View on GitHub
☆1,063Jun 28, 2026Updated 3 weeks ago
cole-wilson / sailboat
View on GitHub
🐍 A quick and easy way to distribute your Python projects!
☆162Oct 24, 2023Updated 2 years ago
btwael / superstring.py
View on GitHub
A fast and memory-optimized string library for heavy-text manipulation in Python
☆251Apr 22, 2020Updated 6 years ago
camelot-dev / excalibur
View on GitHub
A web interface to extract tabular data from PDFs
☆1,810May 20, 2026Updated 2 months ago
johnhw / funnelplot
View on GitHub
Funnel plot
☆45Apr 11, 2023Updated 3 years ago
HazyResearch / pdftotree
View on GitHub
A tool for converting PDF into hOCR with text, tables, and figures being recognized and preserved.
☆460Aug 3, 2023Updated 2 years ago
johndoe31415 / llpdf
View on GitHub
Pure-Python low-level PDF manipulation library
☆14Jan 29, 2022Updated 4 years ago
python-testing-crawler / python-testing-crawler
View on GitHub
A crawler for automated functional testing of a web application
☆74May 1, 2023Updated 3 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
multilexsum / dataset
View on GitHub
Multi-LexSum is an abstractive summarization dataset for US Civil Rights Lawsuits
☆23Dec 15, 2022Updated 3 years ago
Layout-Parser / layout-parser
View on GitHub
A Unified Toolkit for Deep Learning Based Document Image Analysis
☆5,764Aug 15, 2024Updated last year
euske / pdfminer
View on GitHub
Python PDF Parser (Not actively maintained). Check out pdfminer.six.
☆5,283Dec 7, 2022Updated 3 years ago
atlanhq / camelot
View on GitHub
Camelot: PDF Table Extraction for Humans
☆3,716Jan 5, 2023Updated 3 years ago
ExtractTable / ExtractTable-py
View on GitHub
Python library to extract tabular data from images and scanned PDFs
☆285Jul 30, 2024Updated last year
idlesign / envbox
View on GitHub
Detect environment type and work within.
☆25Dec 13, 2025Updated 7 months ago
amdp-chauhan / complete-deployable-ml-solution
View on GitHub
API-First approach to make Machine Learning solution usable
☆14Jan 26, 2019Updated 7 years ago
allenai / mmda
View on GitHub
multimodal document analysis
☆166May 14, 2026Updated 2 months ago
allenai / smashed
View on GitHub
SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batchi…
☆35May 24, 2024Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
ecatkins / xpdf_python
View on GitHub
Python wrapper for xpdf
☆19Nov 28, 2019Updated 6 years ago
stephenc222 / example-ocr-with-multi-modal-llms
View on GitHub
An example project demonstrating how to perform OCR with multi-modal LLMs
☆10Mar 14, 2024Updated 2 years ago
deanmalmgren / textract
View on GitHub
extract text from any document. no muss. no fuss.
☆4,670Jul 11, 2026Updated last week
boristsr / FaceLean
View on GitHub
An experiment to use a webcam as a game input device.
☆12Nov 22, 2022Updated 3 years ago
security-force-monitor / sfm-cms
View on GitHub
Platform for sharing complex information about security forces. Powers WhoWasInCommand.com
☆10Mar 1, 2024Updated 2 years ago
orsinium-labs / svg.py
View on GitHub
🎨 Type-safe and powerful Python library to generate SVG files
☆395Dec 28, 2025Updated 6 months ago
ahawker / scratchdir
View on GitHub
Context manager to maintain your temporary directories/files.
☆17Jan 23, 2023Updated 3 years ago