Yuliang-Liu / MonkeyOCRLinks
A lightweight LMM-based Document Parsing Model
☆6,461Updated this week
Alternatives and similar repositories for MonkeyOCR
Users that are interested in MonkeyOCR are comparing it to the libraries listed below
Sorting:
- OCRFlux is a lightweight yet powerful multimodal toolkit that significantly advances PDF-to-Markdown conversion, excelling in complex lay…☆2,480Updated 6 months ago
- LAYRA—an enterprise-ready, out-of-the-box solution—unlocks next-generation intelligent systems powered by visual RAG and limitless visual…☆900Updated 3 months ago
- AI-Powered Python & Python-Powered AI (Python-Use)☆3,404Updated this week
- An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)☆1,851Updated 5 months ago
- UltraRAG v3: A Low-Code MCP Framework for Building Complex and Innovative RAG Pipelines☆5,119Updated this week
- MiroThinker is an open source deep research agent optimized for research and prediction. It achieves a 80.8% Avg@8 score on the challengi…☆6,151Updated this week
- An Agentic Framework for Reflective PowerPoint Generation☆3,299Updated last week
- DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception☆1,981Updated 9 months ago
- 基于PaddleOCR重构,并且脱离PaddlePaddle深度学习训练框架的轻量级OCR,推理速度超快 —— A lightweight OCR system based on PaddleOCR, decoupled from the PaddlePaddle d…☆1,670Updated 3 months ago
- BISHENG is an open LLM devops platform for next generation Enterprise AI applications. Powerful and comprehensive features include: GenAI…☆11,087Updated this week
- Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language - eko.fellou.ai☆4,856Updated 3 weeks ago
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆1,479Updated last month
- "Your Fully-Automated Personal AI Assistant"☆1,362Updated 3 months ago
- Multilingual Document Layout Parsing in a Single Vision-Language Model☆7,139Updated last month
- [KDD'2026] "VideoRAG: Chat with Your Videos"☆2,659Updated last month
- 整理目前开源的最优表格识别模型,完善前后处理,模型转换为ONNX | Organize the currently open-source optimal table recognition models, improve pre-processing and post-…☆920Updated 6 months ago
- MiroFlow is an agent framework that enables tool-use agent tasks, featuring a reproducible GAIA score of 82.4%.☆2,442Updated last week
- Convert files (PDF, image, Word, PPT, Excel, notebooks, code snippets) to markdown using powerful multimodal LLM☆323Updated 9 months ago
- ScreenCoder — Turn any UI screenshot into clean, editable HTML/CSS with full control. Fast, accurate, and easy to customize.☆2,555Updated 3 months ago
- ☆876Updated this week
- A Docker-powered service for PDF document layout analysis. This service provides a powerful and flexible PDF analysis service. The servic…☆1,075Updated last month
- ☆1,523Updated last month
- Long-form streaming TTS system for multi-speaker dialogue generation☆1,329Updated 3 months ago
- Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models (CVPR 2024 Highlight)☆1,947Updated 2 weeks ago
- [EMNLP-2024] Build multimodal language agents for fast prototype and production☆2,626Updated 10 months ago
- PIKE-RAG: sPecIalized KnowledgE and Rationale Augmented Generation☆2,366Updated 5 months ago
- AI Manus is a general-purpose AI Agent system that supports running various tools and operations in a sandbox environment.☆1,426Updated 2 months ago
- "MiniRAG: Making RAG Simpler with Small and Open-Sourced Language Models"☆1,708Updated 3 months ago
- ☆2,566Updated 6 months ago
- The Intelligent GUI Agent for Mobile Phones☆1,732Updated last week