Yuliang-Liu / MonkeyOCRLinks
A lightweight LMM-based Document Parsing Model
☆6,234Updated this week
Alternatives and similar repositories for MonkeyOCR
Users that are interested in MonkeyOCR are comparing it to the libraries listed below
Sorting:
- OCRFlux is a lightweight yet powerful multimodal toolkit that significantly advances PDF-to-Markdown conversion, excelling in complex lay…☆2,381Updated 3 months ago
- LAYRA—an enterprise-ready, out-of-the-box solution—unlocks next-generation intelligent systems powered by visual RAG and limitless visual…☆885Updated last month
- AI-Powered Python & Python-Powered AI (Python-Use)☆3,065Updated this week
- An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)☆1,801Updated 2 months ago
- "VideoRAG: Chat with Your Videos"☆1,286Updated last month
- 基于PaddleOCR重构,并且脱离PaddlePaddle深度学习训练框架的轻量级OCR,推理速度超快 —— A lightweight OCR system based on PaddleOCR, decoupled from the PaddlePaddle d…☆1,548Updated 3 weeks ago
- Multilingual Document Layout Parsing in a Single Vision-Language Model☆5,684Updated 3 weeks ago
- "Your Fully-Automated Personal AI Assistant"☆1,289Updated last month
- UltraRAG v2: Less Code, Lower Barrier, Faster Deployment! MCP-based low-code RAG framework, enabling researchers to build complex pipelin…☆1,891Updated this week
- Convert files (PDF, image, Word, PPT, Excel, notebooks, code snippets) to markdown using powerful multimodal LLM☆310Updated 6 months ago
- PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides [EMNLP 2025]☆2,253Updated last week
- Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language - eko.fellou.ai☆4,750Updated this week
- ScreenCoder — Turn any UI screenshot into clean, editable HTML/CSS with full control. Fast, accurate, and easy to customize.☆2,483Updated last month
- Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models (CVPR 2024 Highlight)☆1,931Updated last month
- Build multimodal language agents for fast prototype and production☆2,594Updated 8 months ago
- Youtu-GraphRAG boosts cost efficiency, inference accuracy, and cross-domain adaptability, pushing the boundaries of performance in comple…☆920Updated 3 weeks ago
- ☆776Updated last month
- 整理目前开源的最优表格识别模型,完善前后处理,模型转换为ONNX Organize the currently open-source optimal table recognition models, improve pre-processing and post…☆876Updated 3 months ago
- AI Manus is a general-purpose AI Agent system that supports running various tools and operations in a sandbox environment.☆1,180Updated last week
- ☆2,448Updated 3 months ago
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆1,195Updated last week
- LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.☆7,611Updated this week
- 🚀 Truly open-source AI avatar(digital human) toolkit for offline video generation and digital human cloning.☆11,699Updated last month
- Long-form streaming TTS system for multi-speaker dialogue generation☆1,214Updated 3 weeks ago
- BISHENG is an open LLM devops platform for next generation Enterprise AI applications. Powerful and comprehensive features include: GenAI…☆10,007Updated this week
- MiniCPM4 & MiniCPM4.1: Ultra-Efficient LLMs on End Devices, achieving 3+ generation speedup on reasoning tasks☆8,418Updated last month
- [ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.☆1,882Updated 10 months ago
- MultiAgentPPT 是一个集成了 A2A(Agent2Agent)+ MCP(Model Context Protocol)+ ADK(Agent Development Kit) 架构的智能化演示文稿生成系统,支持通过多智能体协作和流式并发机制☆1,399Updated 2 months ago
- Next-Generation Interactive Intelligent Programming Assistant☆878Updated last year
- An Open-Source AI Writing Project.☆739Updated 3 weeks ago