pymupdf/pymupdf4llm

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/pymupdf/pymupdf4llm)

pymupdf / pymupdf4llm

PyMuPDF4LLM

☆2,001

Alternatives and similar repositories for pymupdf4llm

Users that are interested in pymupdf4llm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

pymupdf / PyMuPDF
View on GitHub
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
☆10,283Updated this week
datalab-to / marker
View on GitHub
Convert PDF to markdown + JSON quickly with high accuracy
☆37,711Updated this week
docling-project / docling
View on GitHub
Get your documents ready for gen AI
☆63,561Updated this week
Unstructured-IO / unstructured
View on GitHub
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean…
☆15,176Updated this week
vibrantlabsai / ragas
View on GitHub
Supercharge Your LLM Application Evaluations 🚀
☆14,935Feb 24, 2026Updated 4 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
opendatalab / MinerU
View on GitHub
Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.
☆75,311Updated this week
enoch3712 / ExtractThinker
View on GitHub
ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.
☆1,585Aug 27, 2025Updated 10 months ago
run-llama / llama_cloud_services
View on GitHub
Knowledge Agents and Management in the Cloud
☆4,260May 18, 2026Updated 2 months ago
microsoft / graphrag
View on GitHub
A modular graph-based Retrieval-Augmented Generation (RAG) system
☆34,704Updated this week
google / langextract
View on GitHub
A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive vi…
☆37,641Jul 2, 2026Updated 2 weeks ago
opendatalab / PDF-Extract-Kit
View on GitHub
A Comprehensive Toolkit for High-Quality PDF Content Extraction
☆9,797Jan 3, 2025Updated last year
pymupdf / PyMuPDF-Utilities
View on GitHub
Demos, examples and utilities using PyMuPDF
☆723Jan 8, 2026Updated 6 months ago
BerriAI / litellm
View on GitHub
The fastest, litest AI Gateway. Rust core with Python SDK. Call 100+ LLM APIs in OpenAI (or native) format with cost tracking, guardrails…
☆54,241Updated this week
infiniflow / ragflow
View on GitHub
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to creat…
☆85,577Updated this week
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
stanfordnlp / dspy
View on GitHub
DSPy: The framework for programming—not prompting—language models
☆36,293Updated this week
HKUDS / LightRAG
View on GitHub
[EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation"
☆37,966Updated this week
datalab-to / pdftext
View on GitHub
Extract structured text from pdfs quickly
☆707Jul 8, 2026Updated last week
jsvine / pdfplumber
View on GitHub
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
☆10,575Updated this week
microsoft / markitdown
View on GitHub
Python tool for converting files and office documents to Markdown.
☆167,899Updated this week
nlmatics / llmsherpa
View on GitHub
Developer APIs to Accelerate LLM Projects
☆1,746Oct 18, 2024Updated last year
allenai / olmocr
View on GitHub
Toolkit for linearizing PDFs for LLM datasets/training
☆19,151Mar 25, 2026Updated 3 months ago
pydantic / pydantic-ai
View on GitHub
AI Agent Framework, the Pydantic way
☆18,699Updated this week
vllm-project / vllm
View on GitHub
A high-throughput and memory-efficient inference and serving engine for LLMs
☆86,804Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Cinnamon / kotaemon
View on GitHub
An open-source RAG-based tool for chatting with your documents.
☆25,572Jul 14, 2026Updated last week
567-labs / instructor
View on GitHub
structured outputs for llms
☆13,593Jul 13, 2026Updated last week
run-llama / liteparse
View on GitHub
A fast, helpful, and open-source document parser
☆11,719Updated this week
run-llama / llama_index
View on GitHub
LlamaIndex is the leading document agent and OCR platform
☆50,962Updated this week
PrefectHQ / fastmcp
View on GitHub
🚀 The fast, Pythonic way to build MCP servers and clients.
☆26,715Updated this week
datalab-to / surya
View on GitHub
OCR, layout analysis, reading order, table recognition in 90+ languages
☆21,130Updated this week
unclecode / crawl4ai
View on GitHub
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN
☆73,824Updated this week
unslothai / unsloth
View on GitHub
Unsloth is a local UI for training and running Gemma 4, Qwen3.6, DeepSeek, Kimi, GLM and other models.
☆68,666Updated this week
Filimoa / open-parse
View on GitHub
Improved file parsing for LLM’s
☆3,162May 17, 2026Updated 2 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
confident-ai / deepeval
View on GitHub
The LLM Evaluation Framework
☆17,006Updated this week
VectifyAI / PageIndex
View on GitHub
📑 PageIndex: Document Index for Vectorless, Reasoning-based RAG
☆34,152Updated this week
mem0ai / mem0
View on GitHub
Universal memory layer for AI Agents
☆61,383Updated this week
ucbepic / docetl
View on GitHub
A system for agentic LLM-powered data processing and ETL
☆3,909Updated this week
PaddlePaddle / PaddleOCR
View on GitHub
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/…
☆85,960Jul 15, 2026Updated last week
opendatalab / DocLayout-YOLO
View on GitHub
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
☆2,233Apr 14, 2025Updated last year
bytedance / Dolphin
View on GitHub
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
☆9,037Mar 25, 2026Updated 3 months ago