pymupdf / pymupdf4llmLinks
PyMuPDF4LLM
☆1,219Updated last week
Alternatives and similar repositories for pymupdf4llm
Users that are interested in pymupdf4llm are comparing it to the libraries listed below
Sorting:
- Developer APIs to Accelerate LLM Projects☆1,742Updated last year
- Knowledge Agents and Management in the Cloud☆4,226Updated this week
- ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.☆1,475Updated 4 months ago
- High-performance retrieval engine for unstructured data☆1,549Updated 2 months ago
- The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.☆2,442Updated last week
- ☆1,426Updated last year
- Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and Pairwise reranking based on LLMs and cro…☆921Updated 2 weeks ago
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,587Updated 3 weeks ago
- Extract structured text from pdfs quickly☆648Updated 7 months ago
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.☆838Updated 11 months ago
- The official implementation of RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval☆1,529Updated last year
- ☆859Updated last week
- DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception☆1,923Updated 9 months ago
- Running Docling as an API service☆1,110Updated last week
- Lightweight, performant, deep table extraction☆523Updated this week
- This repo provides the server side code for llmsherpa API to connect. It includes parsers for various file formats.☆1,277Updated 9 months ago
- Improved file parsing for LLM’s☆3,150Updated last year
- Generic rag framework to apply the power of LLMs on any given dataset☆660Updated last month
- Simple package to extract text with coordinates from programmatic PDFs☆229Updated this week
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆1,361Updated 3 weeks ago
- Tenacious tool calling built on LangGraph☆1,007Updated 5 months ago
- This package, developed as part of our research detailed in the Chroma Technical Report, provides tools for text chunking and evaluation.…☆465Updated last month
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆523Updated 2 months ago
- A toolkit to create optimal Production-readyRetrieval Augmented Generation(RAG) setup for your data☆1,520Updated 7 months ago
- 🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with DuckDB or PostgreSQL☆1,132Updated this week
- ☆246Updated 7 months ago
- 👩🏻🍳 A collection of example notebooks using Haystack☆517Updated 2 weeks ago
- 📚 Process PDFs, Word documents and more with spaCy☆838Updated 10 months ago
- Parse PDFs into markdown using Vision LLMs☆455Updated 3 months ago
- Python bindings to PDFium, reasonably cross-platform.☆706Updated last week