opendatalab / MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
☆32,686Updated this week
Alternatives and similar repositories for MinerU:
Users that are interested in MinerU are comparing it to the libraries listed below
- Convert PDF to markdown + JSON quickly with high accuracy☆24,672Updated this week
- OCR & Document Extraction using vision models☆11,085Updated this week
- A Comprehensive Toolkit for High-Quality PDF Content Extraction☆7,532Updated 4 months ago
- Toolkit for linearizing PDFs for LLM datasets/training☆12,238Updated this week
- 🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.☆37,357Updated this week
- OCR, layout analysis, reading order, table recognition in 90+ languages☆17,307Updated this week
- SOTA Open Source TTS☆20,921Updated 3 weeks ago
- Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切…☆12,534Updated this week
- 🍒 Cherry Studio is a desktop client that supports for multiple LLM providers.☆25,353Updated this week
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model☆7,512Updated 2 months ago
- Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, m…☆95,261Updated this week
- Let your Claude able to think☆15,023Updated last month
- A simple screen parsing tool towards pure vision based GUI agent☆21,888Updated last month
- 🔥 Open Source No Code Web Data Extraction Platform. Turn Websites To APIs & Spreadsheets With No-Code Robots In Minutes 🔥☆12,436Updated this week
- Python tool for converting files and office documents to Markdown.☆56,041Updated 3 weeks ago
- Question and Answer based on Anything.☆13,113Updated last month
- A generative speech model for daily dialogue.☆36,024Updated last month
- RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.☆51,166Updated this week
- 🤯 Lobe Chat - an open-source, modern-design AI chat framework. Supports Multi AI Providers( OpenAI / Claude 3 / Gemini / Ollama / DeepSe…☆60,012Updated this week
- The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.☆43,502Updated this week
- No fortress, purely open ground. OpenManus is Coming.☆45,114Updated last week
- FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data process…☆23,859Updated this week
- A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations☆13,844Updated this week
- Spark-TTS Inference Code☆9,041Updated 3 weeks ago
- Python scraper based on AI☆19,425Updated last week
- Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks☆6,497Updated 3 weeks ago
- Make websites accessible for AI agents☆58,844Updated this week
- 🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN☆41,970Updated this week
- ⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。☆15,676Updated last month
- Finetune Qwen3, Llama 4, TTS, DeepSeek-R1 & Gemma 3 LLMs 2x faster with 70% less memory! 🦥☆37,861Updated this week