A Python tool to help extracting information from structured PDFs.
☆427May 25, 2026Updated last month
Alternatives and similar repositories for py-pdf-parser
Users that are interested in py-pdf-parser are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Community maintained fork of pdfminer - we fathom PDF☆7,000Mar 13, 2026Updated 3 months ago
- `pdfstructure` detects, splits and organizes the documents text content into its natural structure as envisioned by the author.☆106Apr 1, 2024Updated 2 years ago
- Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.☆10,473Jun 17, 2026Updated 2 weeks ago
- Python API for PDF documents☆124Sep 5, 2024Updated last year
- A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files☆10,099Updated this week
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Simple wrapper of tabula-java: extract table from PDF into pandas DataFrame☆2,315Dec 5, 2024Updated last year
- Demos, examples and utilities using PyMuPDF☆720Jan 8, 2026Updated 5 months ago
- PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.☆10,095Updated this week
- A Python library to extract tabular data from PDFs☆3,767Jun 24, 2026Updated last week
- A Python library for reading and writing PDF, powered by QPDF☆2,750Jun 21, 2026Updated last week
- Python bindings to PDFium, reasonably cross-platform.☆784Updated this week
- pdfrw is a pure Python library that reads and writes PDFs☆1,908Apr 29, 2024Updated 2 years ago
- borb is a library for reading, creating and manipulating PDF files in python.☆3,566Jun 22, 2026Updated last week
- A fast and friendly PDF scraping library.☆781Oct 17, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- 🐍 A quick and easy way to distribute your Python projects!☆162Oct 24, 2023Updated 2 years ago
- Named entity recognition for the legal domain☆43Jun 1, 2021Updated 5 years ago
- ☆1,063Updated this week
- Build a simple command-line interface from your functions☆110May 19, 2026Updated last month
- A web interface to extract tabular data from PDFs☆1,810May 20, 2026Updated last month
- A fast and memory-optimized string library for heavy-text manipulation in Python☆251Apr 22, 2020Updated 6 years ago
- Parsing PDF files with PDFium☆12Nov 7, 2024Updated last year
- Funnel plot☆45Apr 11, 2023Updated 3 years ago
- Parsing pdf tables using YOLOV3☆121Jun 25, 2026Updated last week
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A Unified Toolkit for Deep Learning Based Document Image Analysis☆5,750Aug 15, 2024Updated last year
- extract text from any document. no muss. no fuss.☆4,637May 7, 2026Updated last month
- A crawler for automated functional testing of a web application☆74May 1, 2023Updated 3 years ago
- Python PDF Parser (Not actively maintained). Check out pdfminer.six.☆5,286Dec 7, 2022Updated 3 years ago
- Camelot: PDF Table Extraction for Humans☆3,718Jan 5, 2023Updated 3 years ago
- I will be adding different kind of opensource data extraction tools code using python☆10Nov 15, 2024Updated last year
- multimodal document analysis☆166May 14, 2026Updated last month
- SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batchi…☆35May 24, 2024Updated 2 years ago
- Detect environment type and work within.☆25Dec 13, 2025Updated 6 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Python wrapper for xpdf☆19Nov 28, 2019Updated 6 years ago
- 🎨 Type-safe and powerful Python library to generate SVG files☆394Dec 28, 2025Updated 6 months ago
- Extract structured text from pdfs quickly☆700Jun 10, 2026Updated 3 weeks ago
- An experiment to use a webcam as a game input device.☆12Nov 22, 2022Updated 3 years ago
- Context manager to maintain your temporary directories/files.☆17Jan 23, 2023Updated 3 years ago
- A simple python wrapper for PDFium.☆18Dec 6, 2021Updated 4 years ago
- Python binding to Poppler-cpp pdf library☆116Sep 6, 2024Updated last year