opendatalab / MinerULinks
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
☆48,227Updated this week
Alternatives and similar repositories for MinerU
Users that are interested in MinerU are comparing it to the libraries listed below
Sorting:
- Convert PDF to markdown + JSON quickly with high accuracy☆29,658Updated this week
- A Comprehensive Toolkit for High-Quality PDF Content Extraction☆8,863Updated 10 months ago
- Toolkit for linearizing PDFs for LLM datasets/training☆15,772Updated this week
- OCR & Document Extraction using vision models☆11,925Updated 5 months ago
- Python tool for converting files and office documents to Markdown.☆82,554Updated 2 weeks ago
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model☆7,988Updated 8 months ago
- OCR, layout analysis, reading order, table recognition in 90+ languages☆18,813Updated 2 weeks ago
- 一款提示词优化器,助力于编写高质量的提示词☆16,553Updated last week
- 🔥 MaxKB is an open-source platform for building enterprise-grade agents. MaxKB 是强大易用的开源企业级智能体平台。☆18,806Updated last week
- [EMNLP 2025 Demo] PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,…☆29,464Updated last week
- 🍒 Cherry Studio is a desktop client that supports for multiple LLM providers.☆34,880Updated last week
- Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/…☆62,639Updated this week
- 🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN☆55,294Updated last week
- A simple screen parsing tool towards pure vision based GUI agent☆23,784Updated last month
- FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data process…☆26,167Updated last week
- A powerful tool for creating fine-tuning datasets for LLM☆11,596Updated last week
- Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/☆9,355Updated 6 months ago
- Integrate the DeepSeek API into popular softwares☆34,290Updated last month
- Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.☆47,705Updated last week
- 🔥 The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data☆66,485Updated this week
- OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched☆31,686Updated last week
- An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.☆27,579Updated last month
- Elegant reading of real-time and hottest news☆13,644Updated last week
- Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks☆6,725Updated 4 months ago
- 🌐 Make websites accessible for AI agents. Automate tasks online with ease.☆72,026Updated this week
- 《Hello 算法》:动画图解、一键运行的数据结构与算法教程。支持 Python, Java, C++, C, C#, JS, Go, Swift, Rust, Ruby, Kotlin, TS, Dart 代码。简体版和繁体版同步更新,English version in…☆118,158Updated last week
- ⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。☆20,087Updated 3 months ago
- The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra☆19,392Updated this week
- Production-ready platform for agentic workflow development.☆118,085Updated this week
- ⚡ Easiest no code web data extraction platform • Instantly turn any website into API or spreadsheet ⚡☆13,816Updated this week