Yuliang-Liu / MonkeyOCRLinks
A lightweight LMM-based Document Parsing Model
☆6,353Updated this week
Alternatives and similar repositories for MonkeyOCR
Users that are interested in MonkeyOCR are comparing it to the libraries listed below
Sorting:
- LAYRA—an enterprise-ready, out-of-the-box solution—unlocks next-generation intelligent systems powered by visual RAG and limitless visual…☆891Updated last month
- OCRFlux is a lightweight yet powerful multimodal toolkit that significantly advances PDF-to-Markdown conversion, excelling in complex lay…☆2,407Updated 4 months ago
- AI-Powered Python & Python-Powered AI (Python-Use)☆3,121Updated this week
- An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)☆1,809Updated 3 months ago
- UltraRAG v2: A Low-Code MCP Framework for Building Complex and Innovative RAG Pipelines☆2,290Updated this week
- 基于PaddleOCR重构,并且脱离PaddlePaddle深度学习训练框架的轻量级OCR,推理速度超快 —— A lightweight OCR system based on PaddleOCR, decoupled from the PaddlePaddle d…☆1,558Updated last month
- Multilingual Document Layout Parsing in a Single Vision-Language Model☆5,840Updated last month
- "Your Fully-Automated Personal AI Assistant"☆1,304Updated last month
- [KDD'2026] "VideoRAG: Chat with Your Videos"☆1,346Updated 3 weeks ago
- PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides [EMNLP 2025]☆2,706Updated last week
- ☆1,233Updated last week
- Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language - eko.fellou.ai☆4,784Updated this week
- ☆787Updated 2 months ago
- Convert files (PDF, image, Word, PPT, Excel, notebooks, code snippets) to markdown using powerful multimodal LLM☆311Updated 7 months ago
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆1,268Updated last week
- Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models (CVPR 2024 Highlight)☆1,937Updated last month
- LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.☆7,783Updated last week
- 🚀 EvoAgentX: Building a Self-Evolving Ecosystem of AI Agents☆2,337Updated last week
- "MiniRAG: Making RAG Simpler with Small and Open-Sourced Language Models"☆1,593Updated last month
- Long-form streaming TTS system for multi-speaker dialogue generation☆1,271Updated last month
- ScreenCoder — Turn any UI screenshot into clean, editable HTML/CSS with full control. Fast, accurate, and easy to customize.☆2,513Updated last month
- Build multimodal language agents for fast prototype and production☆2,603Updated 8 months ago
- 整理目前开源的最优表格识别模型,完善前后处理,模型转换为ONNX Organize the currently open-source optimal table recognition models, improve pre-processing and post…☆896Updated 4 months ago
- Easiest and laziest way for building multi-agent LLMs applications.☆3,476Updated last week
- An Open-Source AI Writing Project.☆796Updated 2 weeks ago
- Youtu-GraphRAG boosts cost efficiency, inference accuracy, and cross-domain adaptability, pushing the boundaries of performance in comple…☆957Updated last month
- DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception☆1,822Updated 7 months ago
- BISHENG is an open LLM devops platform for next generation Enterprise AI applications. Powerful and comprehensive features include: GenAI…☆10,545Updated this week
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model☆8,030Updated 10 months ago
- DeepAnalyze is the first agentic LLM for autonomous data science. 🎈你的AI数据分析师,自动分析大量数据,一键生成专业分析报告!☆3,049Updated this week