deepseek-ai / DeepSeek-OCRLinks
Contexts Optical Compression
☆21,561Updated 2 months ago
Alternatives and similar repositories for DeepSeek-OCR
Users that are interested in DeepSeek-OCR are comparing it to the libraries listed below
Sorting:
- Multilingual Document Layout Parsing in a Single Vision-Language Model☆5,936Updated this week
- The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.☆8,073Updated last week
- Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and pe…☆3,859Updated 6 months ago
- Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆17,425Updated last month
- A research prototype of a human-centered web agent☆9,513Updated last week
- The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention☆3,272Updated 5 months ago
- Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!☆8,800Updated last week
- A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive vi…☆19,438Updated last month
- Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, im…☆3,157Updated 2 months ago
- GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning☆2,092Updated last week
- Toolkit for linearizing PDFs for LLM datasets/training☆16,322Updated last week
- gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI☆19,475Updated last month
- LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.☆10,366Updated this week
- Tongyi Deep Research, the Leading Open-source Deep Research Agent☆17,746Updated this week
- Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.☆12,764Updated 3 months ago
- GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models☆3,479Updated last week
- LLM agents built for control. Designed for real-world use. Deployed in minutes.☆16,805Updated this week
- A simple yet powerful agent framework that delivers with open-source models☆4,031Updated this week
- An Autonomous Agentic Framework for Reflective PowerPoint Generation☆2,966Updated this week
- A lightweight LMM-based Document Parsing Model☆6,391Updated 3 weeks ago
- DeepResearchAgent is a hierarchical multi-agent system designed not only for deep research tasks but also for general-purpose task solvin…☆3,009Updated 3 months ago
- OCRFlux is a lightweight yet powerful multimodal toolkit that significantly advances PDF-to-Markdown conversion, excelling in complex lay…☆2,416Updated 4 months ago
- MiniMax-M2, a model built for Max coding & agentic workflows.☆2,089Updated last month
- The simplest, fastest repository for training/finetuning small-sized VLMs.☆4,448Updated 2 months ago
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents☆1,878Updated 2 months ago
- "RAG-Anything: All-in-One RAG Framework"☆11,296Updated this week
- The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra☆20,078Updated 2 weeks ago
- Kimi K2 is the large language model series developed by Moonshot AI team☆9,763Updated last month
- ☆10,022Updated 4 months ago
- s1: Simple test-time scaling☆6,620Updated 6 months ago