Yuliang-Liu / MonkeyOCRLinks
A lightweight LMM-based Document Parsing Model
☆5,614Updated this week
Alternatives and similar repositories for MonkeyOCR
Users that are interested in MonkeyOCR are comparing it to the libraries listed below
Sorting:
- LAYRA—an enterprise-ready, out-of-the-box solution—unlocks next-generation intelligent systems powered by visual RAG and limitless visual…☆803Updated 2 weeks ago
- OCRFlux is a lightweight yet powerful multimodal toolkit that significantly advances PDF-to-Markdown conversion, excelling in complex lay…☆2,189Updated 3 weeks ago
- An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)☆1,665Updated last week
- 🌐 WebAgent for Information Seeking built by Tongyi Lab: WebWalker & WebDancer & WebSailor & WebShaper & WebWatcher https://arxiv.org/abs…☆6,374Updated this week
- Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language - eko.fellou.ai☆4,419Updated this week
- Convert files (PDF, image, Word, PPT, Excel, notebooks, code snippets) to markdown using powerful multimodal LLM☆291Updated 3 months ago
- "Vimo: Chat with Your Videos"☆1,067Updated this week
- "Your Fully-Automated Personal AI Assistant"☆1,104Updated 2 months ago
- Build multimodal language agents for fast prototype and production☆2,546Updated 5 months ago
- LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.☆1,938Updated last week
- Multilingual Document Layout Parsing in a Single Vision-Language Model☆3,671Updated last week
- Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models (CVPR 2024 Highlight)☆1,912Updated last month
- A Docker-powered service for PDF document layout analysis. This service provides a powerful and flexible PDF analysis service. The servic…☆670Updated last week
- AI-Powered Python & Python-Powered AI (Python-Use)☆1,696Updated this week
- Mirix is a multi-agent personal assistant designed to track on-screen activities and answer user questions intelligently. By capturing re…☆1,264Updated this week
- PIKE-RAG: sPecIalized KnowledgE and Rationale Augmented Generation☆2,006Updated 3 months ago
- DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception☆1,564Updated 4 months ago
- 基于PaddleOCR重构,并且脱离PaddlePaddle深度学习训练框架的轻量级OCR,推理速度超快 —— A lightweight OCR system based on PaddleOCR, decoupled from the PaddlePaddle d…☆1,353Updated 2 months ago
- "MiniRAG: Making RAG Simpler with Small and Open-Sourced Language Models"☆1,374Updated last week
- PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides [EMNLP 2025]☆1,915Updated 3 weeks ago
- ScreenCoder — Turn any UI screenshot into clean, editable HTML/CSS with full control. Fast, accurate, and easy to customize.☆2,139Updated last month
- Less Code, Lower Barrier, Faster Deployment☆747Updated this week
- 整理目前开源的最优表格识别模型,完善前后处理,模型转换为ONNX Organize the currently open-source optimal table recognition models, improve pre-processing and post…☆801Updated 3 weeks ago
- ☆2,216Updated 3 weeks ago
- AI Manus is a general-purpose AI Agent system that supports running various tools and operations in a sandbox environment.☆948Updated last week
- MultiAgentPPT 是一个集成了 A2A(Agent2Agent)+ MCP(Model Context Protocol)+ ADK(Agent Development Kit) 架构的智能化演示文稿生成系统,支持通过多智能体协作和流式并发机制☆1,273Updated last week
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆768Updated last week
- cube studio开源云原生一站式机器学习/深度学习/大模型AI平台,算法链路全流程,算力租赁平台,notebook在线开发,拖拉拽任务流pipeline编排,多机多卡分布式训练,超参搜索,推理服务VGPU虚拟化,边缘计算,标注平台,自动化标注,deepseek等大模型…☆1,785Updated last week
- 🚀 EvoAgentX: Building a Self-Evolving Ecosystem of AI Agents☆1,212Updated this week
- BISHENG is an open LLM devops platform for next generation Enterprise AI applications. Powerful and comprehensive features include: GenAI…☆9,540Updated this week