pypdfium2-team/pypdfium2

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/pypdfium2-team/pypdfium2)

pypdfium2-team / pypdfium2

Python bindings to PDFium, reasonably cross-platform.

☆803

Alternatives and similar repositories for pypdfium2

Users that are interested in pypdfium2 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

py-pdf / benchmarks
View on GitHub
Benchmarking PDF libraries
☆338Jul 2, 2025Updated last year
pymupdf / PyMuPDF
View on GitHub
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
☆10,343Updated this week
YinlinHu / pypdfium
View on GitHub
A simple python wrapper for PDFium.
☆18Dec 6, 2021Updated 4 years ago
datalab-to / pdftext
View on GitHub
Extract structured text from pdfs quickly
☆710Jul 8, 2026Updated 3 weeks ago
jsvine / pdfplumber
View on GitHub
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
☆10,599Jul 20, 2026Updated last week
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
pikepdf / pikepdf
View on GitHub
A Python library for reading and writing PDF, powered by QPDF
☆2,770Jul 16, 2026Updated 2 weeks ago
py-pdf / pypdf
View on GitHub
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
☆10,134Updated this week
pdfminer / pdfminer.six
View on GitHub
Community maintained fork of pdfminer - we fathom PDF
☆7,012Mar 13, 2026Updated 4 months ago
bblanchon / pdfium-binaries
View on GitHub
📰 Binary distribution of PDFium
☆1,437Jul 20, 2026Updated last week
Belval / pdf2image
View on GitHub
A python module that wraps the pdftoppm utility to convert PDF to PIL Image object
☆1,975Jul 23, 2024Updated 2 years ago
opendatalab / DocLayout-YOLO
View on GitHub
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
☆2,239Apr 14, 2025Updated last year
datalab-to / surya
View on GitHub
OCR, layout analysis, reading order, table recognition in 90+ languages
☆21,176Jul 23, 2026Updated last week
mindee / doctr
View on GitHub
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning. Ongo…
☆6,198Updated this week
qdrant / fastembed
View on GitHub
Fast, Accurate, Lightweight Python library to make State of the Art Embedding
☆3,113Jul 22, 2026Updated last week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
datalab-to / marker
View on GitHub
Convert PDF to markdown + JSON quickly with high accuracy
☆37,994Jul 20, 2026Updated last week
Unstructured-IO / unstructured
View on GitHub
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean…
☆15,210Updated this week
docling-project / docling-parse
View on GitHub
Simple package to extract text with coordinates from programmatic PDFs
☆326Jul 20, 2026Updated last week
dhdaines / playa
View on GitHub
Parallel and LAzY Analyzer for PDFs 🏖️
☆47Apr 28, 2026Updated 3 months ago
opendatalab / OmniDocBench
View on GitHub
[CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation
☆1,925Updated this week
ajrcarey / pdfium-render
View on GitHub
A high-level idiomatic Rust wrapper around Pdfium, the C++ PDF library used by the Google Chromium project.
☆686Updated this week
microsoft / table-transformer
View on GitHub
Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the o…
☆2,933Jun 24, 2024Updated 2 years ago
docling-project / docling
View on GitHub
Get your documents ready for gen AI
☆63,950Updated this week
camelot-dev / camelot
View on GitHub
A Python library to extract tabular data from PDFs
☆3,791Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
opendatalab / PDF-Extract-Kit
View on GitHub
A Comprehensive Toolkit for High-Quality PDF Content Extraction
☆9,811Jan 3, 2025Updated last year
pymupdf / pymupdf4llm
View on GitHub
PyMuPDF4LLM
☆2,042Updated this week
poloclub / unitable
View on GitHub
UniTable: Towards a Unified Table Foundation Model
☆534Apr 21, 2026Updated 3 months ago
matthewwithanm / python-markdownify
View on GitHub
Convert HTML to Markdown
☆2,229Jun 30, 2026Updated 3 weeks ago
Layout-Parser / layout-parser
View on GitHub
A Unified Toolkit for Deep Learning Based Document Image Analysis
☆5,767Aug 15, 2024Updated last year
opendatalab / UniMERNet
View on GitHub
UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition
☆494Sep 28, 2025Updated 10 months ago
chromium / pdfium
View on GitHub
The PDF library used by the Chromium project
☆542Nov 19, 2025Updated 8 months ago
LlmKira / fast-langdetect
View on GitHub
⚡️ 80x faster Fasttext language detection out of the box | Split text by language
☆318May 25, 2026Updated 2 months ago
jstockwin / py-pdf-parser
View on GitHub
A Python tool to help extracting information from structured PDFs.
☆426Jul 13, 2026Updated 2 weeks ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
deepdoctection / deepdoctection
View on GitHub
A Repo For Document AI
☆3,199Jun 20, 2026Updated last month
pymupdf / PyMuPDF-Utilities
View on GitHub
Demos, examples and utilities using PyMuPDF
☆723Jan 8, 2026Updated 6 months ago
FreeOCR-AI / layoutreader
View on GitHub
A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.
☆323Aug 15, 2025Updated 11 months ago
cbrunet / python-poppler
View on GitHub
Python binding to Poppler-cpp pdf library
☆116Sep 6, 2024Updated last year
innodatalabs / redstork
View on GitHub
Parsing PDF files with PDFium
☆12Nov 7, 2024Updated last year
grobidOrg / grobid
View on GitHub
A machine learning software for extracting information from scholarly documents
☆5,034Updated this week
rapidfuzz / RapidFuzz
View on GitHub
Rapid fuzzy string matching in Python using various string metrics
☆4,038Updated this week