deepseek-ai / DeepSeek-OCRLinks
Contexts Optical Compression
☆20,266Updated 3 weeks ago
Alternatives and similar repositories for DeepSeek-OCR
Users that are interested in DeepSeek-OCR are comparing it to the libraries listed below
Sorting:
- Multilingual Document Layout Parsing in a Single Vision-Language Model☆5,684Updated 2 weeks ago
- MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.☆2,992Updated 4 months ago
- Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.☆6,033Updated last week
- Text-audio foundation model from Boson AI☆7,620Updated 2 months ago
- GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models☆3,181Updated last month
- 本仓库包含对 Claude Code v1.0.33 进行逆向工程的完整研究和分析资料。包括对混淆源代码的深度技术分析、系统架构文档,以及重构 Claude Code agent 系统的实现蓝图。主要发现包括实时 Steering 机制、多 Agent …☆11,268Updated 4 months ago
- ☆1,200Updated 4 months ago
- This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025☆6,891Updated 6 months ago
- Prompt Orchestration Markup Language☆4,727Updated this week
- Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, im…☆2,921Updated last month
- Qwen Code is a coding agent that lives in the digital world.☆15,302Updated this week
- An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)☆1,801Updated 2 months ago
- Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.☆25,335Updated last month
- LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.☆7,536Updated this week
- Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆16,371Updated last week
- ☆1,585Updated last month
- ZeroSearch: Incentivize the Search Capability of LLMs without Searching☆1,194Updated 3 months ago
- The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention☆3,247Updated 4 months ago
- An open protocol enabling communication and interoperability between opaque agentic applications.☆20,659Updated this week
- Get started with building Fullstack Agents using Gemini 2.5 and LangGraph☆17,319Updated 3 weeks ago
- Nano vLLM☆8,748Updated 2 weeks ago
- Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation☆4,347Updated 4 months ago
- Open-source unified multimodal model☆5,282Updated 3 weeks ago
- A live stream development of RL tunning for LLM agents☆3,610Updated last month
- 🚀 The fast, Pythonic way to build MCP servers and clients☆20,333Updated this week
- Kimi K2 is the large language model series developed by Moonshot AI team☆9,330Updated last week
- ☆8,223Updated last week
- Renderer for the harmony response format to be used with gpt-oss☆4,007Updated 2 weeks ago
- A simple yet powerful agent framework that delivers with open-source models☆3,831Updated last week
- OmniGen2: Exploration to Advanced Multimodal Generation.☆3,940Updated last month