pymupdf/PyMuPDF-Utilities

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/pymupdf/PyMuPDF-Utilities)

pymupdf / PyMuPDF-Utilities

Demos, examples and utilities using PyMuPDF

☆723

Alternatives and similar repositories for PyMuPDF-Utilities

Users that are interested in PyMuPDF-Utilities are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

pymupdf / PyMuPDF
View on GitHub
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
☆10,283Updated this week
jsvine / pdfplumber
View on GitHub
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
☆10,575Updated this week
doc-analysis / DocBank
View on GitHub
DocBank: A Benchmark Dataset for Document Layout Analysis
☆652Aug 12, 2024Updated last year
pdfminer / pdfminer.six
View on GitHub
Community maintained fork of pdfminer - we fathom PDF
☆7,002Mar 13, 2026Updated 4 months ago
jstockwin / py-pdf-parser
View on GitHub
A Python tool to help extracting information from structured PDFs.
☆425Jul 13, 2026Updated last week
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
py-pdf / pypdf
View on GitHub
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
☆10,121Jun 30, 2026Updated 3 weeks ago
BobLd / DocumentLayoutAnalysis
View on GitHub
Document Layout Analysis resources repos for development with PdfPig.
☆637Oct 1, 2023Updated 2 years ago
Layout-Parser / layout-parser
View on GitHub
A Unified Toolkit for Deep Learning Based Document Image Analysis
☆5,764Aug 15, 2024Updated last year
pikepdf / pikepdf
View on GitHub
A Python library for reading and writing PDF, powered by QPDF
☆2,766Updated this week
opendatalab / Miner-PDF-Benchmark
View on GitHub
MPB (Miner-PDF-Benchmark) is an end-to-end PDF document comprehension evaluation suite designed for large-scale model data scenarios.
☆24Dec 11, 2024Updated last year
Belval / pdf2image
View on GitHub
A python module that wraps the pdftoppm utility to convert PDF to PIL Image object
☆1,975Jul 23, 2024Updated last year
camelot-dev / camelot
View on GitHub
A Python library to extract tabular data from PDFs
☆3,786Updated this week
prohandler / GS-Bulk-Emails
View on GitHub
Google App Scripts that sends a number of emails from the specific number and that tracks the open status of each email
☆17Dec 11, 2024Updated last year
allenai / pawls
View on GitHub
Software that makes labeling PDFs easy.
☆433May 13, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
pypdfium2-team / pypdfium2
View on GitHub
Python bindings to PDFium, reasonably cross-platform.
☆799Updated this week
tstanislawek / awesome-document-understanding
View on GitHub
A curated list of resources for Document Understanding (DU) topic
☆1,525Jun 2, 2023Updated 3 years ago
maxpmaxp / pdfreader
View on GitHub
Python API for PDF documents
☆124Sep 5, 2024Updated last year
janedoesrepo / pdfreader
View on GitHub
Extracting Semi-Structured Data from PDFs on a large scale
☆52Jul 7, 2022Updated 4 years ago
ibm-aur-nlp / PubTabNet
View on GitHub
☆483Jul 8, 2025Updated last year
Unstructured-IO / unstructured
View on GitHub
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean…
☆15,176Updated this week
ArtifexSoftware / pdf2docx
View on GitHub
Open source Python library for converting PDF to DOCX.
☆3,469May 1, 2026Updated 2 months ago
deepdoctection / deepdoctection
View on GitHub
A Repo For Document AI
☆3,192Jun 20, 2026Updated last month
JSchoonmaker / PDF-Text-Extraction
View on GitHub
☆12Mar 24, 2021Updated 5 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
microsoft / table-transformer
View on GitHub
Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the o…
☆2,930Jun 24, 2024Updated 2 years ago
deepdoctection / notebooks
View on GitHub
Repository for deepdoctection tutorial notebooks
☆53Jan 1, 2026Updated 6 months ago
internetarchive / archive-pdf-tools
View on GitHub
Fast PDF generation and compression. Deals with millions of pages daily.
☆146Mar 2, 2026Updated 4 months ago
drj11 / pdftables
View on GitHub
A library for extracting tables from PDF files
☆93Aug 2, 2020Updated 5 years ago
pmaupin / pdfrw
View on GitHub
pdfrw is a pure Python library that reads and writes PDFs
☆1,908Apr 29, 2024Updated 2 years ago
allenai / pdffigures2
View on GitHub
Given a scholarly PDF, extract figures, tables, captions, and section titles.
☆750Mar 10, 2024Updated 2 years ago
ArtifexSoftware / mupdf
View on GitHub
mupdf mirror
☆2,868Updated this week
facebookresearch / nougat
View on GitHub
Implementation of Nougat Neural Optical Understanding for Academic Documents
☆10,046Feb 21, 2025Updated last year
grobidOrg / grobid
View on GitHub
A machine learning software for extracting information from scholarly documents
☆5,010Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
datalab-to / pdftext
View on GitHub
Extract structured text from pdfs quickly
☆707Jul 8, 2026Updated last week
maxdotio / mighty-batch
View on GitHub
Highly concurrent and fast content processing for Mighty Inference Server
☆10Feb 6, 2023Updated 3 years ago
alephdata / languagecodes
View on GitHub
A Python helper library to convert between ISO 639 two- and three-letter codes.
☆11Nov 13, 2024Updated last year
camelot-dev / excalibur
View on GitHub
A web interface to extract tabular data from PDFs
☆1,810May 20, 2026Updated 2 months ago
datalab-to / marker
View on GitHub
Convert PDF to markdown + JSON quickly with high accuracy
☆37,711Updated this week
mrpositron / paper2tex
View on GitHub
Extracting LaTeX equations from PDF
☆21Sep 14, 2023Updated 2 years ago
mscarey / legislice
View on GitHub
API client for fetching and comparing passages from legislation
☆14Jun 29, 2026Updated 3 weeks ago