deepseek-ai/DeepSeek-OCR-2

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/deepseek-ai/DeepSeek-OCR-2)

deepseek-ai / DeepSeek-OCR-2

Visual Causal Flow

☆3,161

Alternatives and similar repositories for DeepSeek-OCR-2

Users that are interested in DeepSeek-OCR-2 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

deepseek-ai / DeepSeek-OCR
View on GitHub
Contexts Optical Compression
☆23,615Jan 27, 2026Updated 5 months ago
zai-org / GLM-OCR
View on GitHub
GLM-OCR: Accurate × Fast × Comprehensive
☆7,187Apr 21, 2026Updated 3 months ago
deepseek-ai / Engram
View on GitHub
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
☆4,536Jan 14, 2026Updated 6 months ago
Tencent-Hunyuan / HunyuanOCR
View on GitHub
☆1,867Updated this week
QwenLM / Qwen3-VL
View on GitHub
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
☆19,637Jan 30, 2026Updated 5 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
PaddlePaddle / PaddleOCR
View on GitHub
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/…
☆85,960Updated this week
studio-dots-ai / dots.ocr
View on GitHub
Multilingual Document Layout Parsing in a Single Vision-Language Model
☆9,016Mar 24, 2026Updated 3 months ago
baidu / Unlimited-OCR
View on GitHub
Unlimited OCR Works: Welcome the Era of One-shot Long-horizon Parsing.
☆16,212Updated this week
zai-org / GLM-Image
View on GitHub
GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image Generation.
☆993Mar 20, 2026Updated 4 months ago
opendatalab / MinerU
View on GitHub
Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.
☆75,311Updated this week
QwenLM / Qwen3-Omni
View on GitHub
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, im…
☆3,900Apr 23, 2026Updated 2 months ago
vllm-project / vllm
View on GitHub
A high-throughput and memory-efficient inference and serving engine for LLMs
☆86,804Updated this week
deepseek-ai / DeepSeek-Math-V2
View on GitHub
☆1,594Dec 1, 2025Updated 7 months ago
allenai / olmocr
View on GitHub
Toolkit for linearizing PDFs for LLM datasets/training
☆19,151Mar 25, 2026Updated 3 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
unslothai / unsloth
View on GitHub
Unsloth is a local UI for training and running Gemma 4, Qwen3.6, DeepSeek, Kimi, GLM and other models.
☆68,666Updated this week
QwenLM / Qwen-Image
View on GitHub
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
☆8,142Feb 10, 2026Updated 5 months ago
QwenLM / Qwen3.6
View on GitHub
Qwen3.6 is the large language model series developed by Qwen team, Alibaba Group.
☆3,705Jun 3, 2026Updated last month
deepseek-ai / DeepSpec
View on GitHub
DeepSpec: a full-stack codebase for training and evaluating speculative decoding algorithms
☆6,719Jul 9, 2026Updated last week
bytedance / deer-flow
View on GitHub
An open-source long-horizon SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skill, s…
☆77,530Updated this week
openclaw / openclaw
View on GitHub
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
☆383,690Updated this week
zai-org / GLM-V
View on GitHub
GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
☆2,356Updated this week
sgl-project / sglang
View on GitHub
SGLang is a high-performance serving framework for large language models and multimodal models.
☆30,583Updated this week
Alibaba-NLP / DeepResearch
View on GitHub
Tongyi Deep Research, the Leading Open-source Deep Research Agent
☆19,691Feb 27, 2026Updated 4 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
MoonshotAI / Attention-Residuals
View on GitHub
☆3,358Mar 17, 2026Updated 4 months ago
opendatalab / OmniDocBench
View on GitHub
[CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation
☆1,900Jun 26, 2026Updated 3 weeks ago
google / langextract
View on GitHub
A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive vi…
☆37,641Jul 2, 2026Updated 2 weeks ago
ByteDance-Seed / Bagel
View on GitHub
Open-source unified multimodal model
☆6,106May 4, 2026Updated 2 months ago
QwenLM / Qwen-Agent
View on GitHub
Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.
☆16,821Mar 4, 2026Updated 4 months ago
deepseek-ai / DeepSeek-V3.2-Exp
View on GitHub
☆1,620Nov 18, 2025Updated 8 months ago
modelscope / ms-swift
View on GitHub
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-V4, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL…
☆14,887Updated this week
QwenLM / Qwen3
View on GitHub
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
☆27,408Jan 9, 2026Updated 6 months ago
FireRedTeam / FireRed-OCR
View on GitHub
☆289Mar 4, 2026Updated 4 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
datalab-to / chandra
View on GitHub
OCR model that handles complex tables, forms, handwriting with full layout.
☆11,727Jun 26, 2026Updated 3 weeks ago
NousResearch / hermes-agent
View on GitHub
The agent that grows with you
☆218,250Updated this week
QwenLM / Qwen3-TTS
View on GitHub
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streamin…
☆12,520Mar 17, 2026Updated 4 months ago
ByteDance-Seed / Seed1.5-VL
View on GitHub
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving stat…
☆1,582Jun 14, 2025Updated last year
karpathy / autoresearch
View on GitHub
AI agents running research on single-GPU nanochat training automatically
☆91,712Mar 26, 2026Updated 3 months ago
QwenLM / Qwen3-VL-Embedding
View on GitHub
☆1,335Jun 23, 2026Updated 3 weeks ago
infiniflow / ragflow
View on GitHub
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to creat…
☆85,577Updated this week