Yuliang-Liu / MonkeyOCRLinks
A lightweight LMM-based Document Parsing Model
☆2,802Updated this week
Alternatives and similar repositories for MonkeyOCR
Users that are interested in MonkeyOCR are comparing it to the libraries listed below
Sorting:
- LAYRA—an enterprise-ready, out-of-the-box solution—unlocks next-generation intelligent systems powered by visual RAG and limitless visual…☆745Updated last week
- "VideoRAG: Retrieval-Augmented Generation with Extreme Long-Context Videos"☆722Updated last week
- "Your Fully-Automated Personal AI Assistant, and Open-Source & Cost-Efficient Alternative to OpenAI's Deep Research"☆1,016Updated 2 months ago
- Build multimodal language agents for fast prototype and production☆2,512Updated 3 months ago
- Convert files (PDF, image, Word, PPT, Excel, notebooks, code snippets) to markdown using powerful multimodal LLM☆263Updated last month
- AI Manus is a general-purpose AI Agent system that supports running various tools and operations in a sandbox environment.☆735Updated this week
- Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language - eko.fellou.ai☆4,078Updated this week
- 基于PaddleOCR重构,并且脱离PaddlePaddle深度学习训练框架的轻量级OCR,推理速度超快 —— A lightweight OCR system based on PaddleOCR, decoupled from the PaddlePaddle d…☆1,202Updated last week
- Next-Generation Interactive Intelligent Programming Assistant☆846Updated 8 months ago
- ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents☆500Updated 2 weeks ago
- "MiniRAG: Making RAG Simpler with Small and Free Language Models"☆1,181Updated 3 weeks ago
- An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)☆1,220Updated last week
- PIKE-RAG: sPecIalized KnowledgE and Rationale Augmented Generation☆1,832Updated last month
- 🚀 EvoAgentX: Building a Self-Evolving Ecosystem of AI Agents☆931Updated this week
- Skywork-R1V2:Multimodal Hybrid Reinforcement Learning for Reasoning☆2,634Updated 2 weeks ago
- 🌐 WebWalker [ACL2025] & WebDancer [Preprint]☆1,111Updated this week
- Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining"☆629Updated 4 months ago
- Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models (CVPR 2024 Highlight)☆1,859Updated 3 weeks ago
- Easiest and laziest way for building multi-agent LLMs applications.☆2,030Updated this week
- Align Anything: Training All-modality Model with Feedback☆4,085Updated last month
- AI-Powered Python & Python-Powered AI (Python-Use)☆1,330Updated this week
- PaSa -- an advanced paper search agent powered by large language models. It can autonomously make a series of decisions, including invoki…☆1,212Updated last month
- DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception☆1,367Updated 2 months ago
- AgentCPM-GUI: An on-device GUI agent for operating Android apps, enhancing reasoning ability with reinforcement fine-tuning for efficient…☆834Updated 2 weeks ago
- The official repository of the paper "(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long…☆568Updated last month
- An open-sourced end-to-end VLM-based GUI Agent☆973Updated 2 months ago
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆526Updated last month
- "RAG-Anything: All-in-One RAG System"☆475Updated this week
- ✨✨VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model☆588Updated last month
- Real Time High-Fidelity Faceswap☆810Updated last month