Yuliang-Liu / MonkeyOCRLinks
A lightweight LMM-based Document Parsing Model
☆4,930Updated this week
Alternatives and similar repositories for MonkeyOCR
Users that are interested in MonkeyOCR are comparing it to the libraries listed below
Sorting:
- LAYRA—an enterprise-ready, out-of-the-box solution—unlocks next-generation intelligent systems powered by visual RAG and limitless visual…☆776Updated this week
- OCRFlux is a lightweight yet powerful multimodal toolkit that significantly advances PDF-to-Markdown conversion, excelling in complex lay…☆1,767Updated last week
- 🌐 WebAgent for Information Seeking built by Tongyi Lab: WebWalker & WebDancer & WebSailor https://arxiv.org/pdf/2507.02592☆4,019Updated last week
- "Your Fully-Automated Personal AI Assistant"☆1,053Updated 2 weeks ago
- "VideoRAG: Retrieval-Augmented Generation with Extreme Long-Context Videos"☆770Updated 3 weeks ago
- Convert files (PDF, image, Word, PPT, Excel, notebooks, code snippets) to markdown using powerful multimodal LLM☆273Updated 2 months ago
- Build multimodal language agents for fast prototype and production☆2,529Updated 4 months ago
- Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language - eko.fellou.ai☆4,211Updated this week
- 基于PaddleOCR重构,并且脱离PaddlePaddle深度学习训练框架的轻量级OCR,推理速度超快 —— A lightweight OCR system based on PaddleOCR, decoupled from the PaddlePaddle d…☆1,237Updated 3 weeks ago
- AI-Powered Python & Python-Powered AI (Python-Use)☆1,450Updated this week
- Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework☆746Updated 2 weeks ago
- 🚀 EvoAgentX: Building a Self-Evolving Ecosystem of AI Agents☆996Updated this week
- Next-Generation Interactive Intelligent Programming Assistant☆856Updated 9 months ago
- Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models (CVPR 2024 Highlight)☆1,876Updated last month
- "RAG-Anything: All-in-One RAG System"☆1,616Updated this week
- AI Manus is a general-purpose AI Agent system that supports running various tools and operations in a sandbox environment.☆838Updated this week
- An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)☆1,511Updated 2 weeks ago
- PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides☆1,731Updated 2 weeks ago
- Chat Multiple PDFs in Zotero AI with Gemini, Grok 4, DeepSeek, GPT, ChatGPT, Claude, OpenRouter, Gemma 3, Qwen 3☆1,778Updated last week
- Real Time High-Fidelity Faceswap☆823Updated last month
- ☆10,546Updated last month
- Easiest and laziest way for building multi-agent LLMs applications.☆2,202Updated this week
- BISHENG is an open LLM devops platform for next generation Enterprise AI applications. Powerful and comprehensive features include: GenAI…☆9,112Updated this week
- Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI (Kunlun Inc.), specializing in vision-language reasoning.☆2,773Updated this week
- (ACL-2025 main conference) SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automat…☆269Updated 3 weeks ago
- "MiniRAG: Making RAG Simpler with Small and Open-Sourced Language Models"☆1,243Updated last month
- cube studio开源云原生一站式机器学习/深度学习/大模型AI平台,算法链路全流程,支持大数据平台对接,notebook在线开发,拖拉拽任务流pipeline编排,多机多卡分布式训练,超参搜索,推理服务VGPU虚拟化,边缘计算,标注平台,自动化标注,deepseek等…☆1,736Updated last week
- 🚀 全网效果最好的移动端【实时对话数字人】。 支持本地部署、多模态交互(语音、文本、表情),响应速度低于 1.5 秒,适用于直播、教学、客服、金融、政务等对隐私与实时性要求极高的场景。开箱即用,开发者友好。☆7,200Updated this week
- "AutoAgent: Fully-Automated and Zero-Code LLM Agent Framework"☆5,573Updated 2 weeks ago
- ☆292Updated last week