Improved file parsing for LLM’s
☆3,154Nov 13, 2024Updated last year
Alternatives and similar repositories for open-parse
Users that are interested in open-parse are comparing it to the libraries listed below
Sorting:
- Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean…☆14,011Updated this week
- UniTable: Towards a Unified Table Foundation Model☆523Jun 4, 2024Updated last year
- Convert PDF to markdown + JSON quickly with high accuracy☆31,857Feb 9, 2026Updated 2 weeks ago
- Structured Outputs☆13,456Feb 13, 2026Updated 2 weeks ago
- High-performance retrieval engine for unstructured data☆1,561Nov 10, 2025Updated 3 months ago
- OCR, layout analysis, reading order, table recognition in 90+ languages☆19,283Feb 4, 2026Updated 3 weeks ago
- Developer APIs to Accelerate LLM Projects☆1,743Oct 18, 2024Updated last year
- 💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows☆12,210Updated this week
- SoTA production-ready AI retrieval system. Agentic Retrieval-Augmented Generation (RAG) with a RESTful API.☆7,693Nov 7, 2025Updated 3 months ago
- Supercharge Your LLM Application Evaluations 🚀☆12,667Jan 31, 2026Updated 3 weeks ago
- DSPy: The framework for programming—not prompting—language models☆32,381Updated this week
- Knowledge Agents and Management in the Cloud☆4,233Feb 17, 2026Updated last week
- Structured data extraction and instruction calling with ML, LLM and Vision LLM☆5,125Updated this week
- This repo provides the server side code for llmsherpa API to connect. It includes parsers for various file formats.☆1,273Mar 28, 2025Updated 10 months ago
- RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry☆4,316Nov 26, 2025Updated 3 months ago
- Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks☆6,800Dec 12, 2025Updated 2 months ago
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…☆3,859May 17, 2025Updated 9 months ago
- structured outputs for llms☆12,428Updated this week
- Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing a…☆36,458Updated this week
- Letta is the platform for building stateful agents: AI with advanced memory that can learn and self-improve over time.☆21,214Jan 29, 2026Updated 3 weeks ago
- LlamaIndex is the leading document agent and OCR platform☆47,210Updated this week
- OCR & Document Extraction using vision models☆12,144May 20, 2025Updated 9 months ago
- Large Action Model framework to develop AI Web Agents☆6,303Jan 21, 2025Updated last year
- A Repo For Document AI☆3,137Feb 17, 2026Updated last week
- Universal memory layer for AI Agents☆47,994Updated this week
- Get your documents ready for gen AI☆54,094Updated this week
- Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/☆9,838May 8, 2025Updated 9 months ago
- A modular graph-based Retrieval-Augmented Generation (RAG) system☆31,031Updated this week
- A guidance language for controlling large language models.☆21,319Feb 13, 2026Updated 2 weeks ago
- A Comprehensive Toolkit for High-Quality PDF Content Extraction☆9,379Jan 3, 2025Updated last year
- LLocalSearch is a completely locally running search aggregator using LLM Agents. The user can ask a question and the system will use a ch…☆5,959Dec 11, 2025Updated 2 months ago
- Retrieval Augmented Generation (RAG) chatbot powered by Weaviate☆7,574Jul 14, 2025Updated 7 months ago
- The programming language for agentic software. Build, run, and manage multi-agent systems at scale.☆38,104Updated this week
- Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and a…☆24,295Updated this week
- Enhances Tesseract OCR output using LLMs (local or API) for error correction, smart chunking, and markdown formatting of scanned PDFs☆2,862Jan 22, 2026Updated last month
- No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents☆6,438Updated this week
- The Open Source Memory Layer For Autonomous Agents☆2,568Oct 22, 2024Updated last year
- File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.☆7,278Feb 21, 2025Updated last year
- [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which ach…☆5,848Oct 28, 2025Updated 3 months ago