deepseek-ai / DeepSeek-OCRLinks
Contexts Optical Compression
☆18,351Updated this week
Alternatives and similar repositories for DeepSeek-OCR
Users that are interested in DeepSeek-OCR are comparing it to the libraries listed below
Sorting:
- Open Source Deep Research Alternative to Reason and Search on Private Data. Written in Python.☆7,090Updated 3 months ago
- Toolkit for linearizing PDFs for LLM datasets/training☆14,689Updated this week
- Multilingual Document Layout Parsing in a Single Vision-Language Model☆5,432Updated 2 weeks ago
- Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆15,465Updated last week
- GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models☆3,104Updated 2 weeks ago
- Kimi K2 is the large language model series developed by Moonshot AI team☆8,370Updated last month
- Keep searching, reading webpages, reasoning until it finds the answer (or exceeding the token budget)☆4,953Updated 3 weeks ago
- ☆8,003Updated this week
- Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation☆4,301Updated 4 months ago
- DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding☆5,094Updated 8 months ago
- The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention☆3,207Updated 3 months ago
- Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and pe…☆3,745Updated 4 months ago
- Text-audio foundation model from Boson AI☆7,483Updated last month
- Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, im…☆2,762Updated 3 weeks ago
- A research prototype of a human-centered web agent☆7,879Updated last week
- A live stream development of RL tunning for LLM agents☆3,562Updated 3 weeks ago
- LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.☆6,576Updated last week
- The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.☆7,538Updated last week
- ☆3,467Updated 7 months ago
- GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning☆1,713Updated 2 weeks ago
- Build Real-Time Knowledge Graphs for AI Agents☆19,313Updated last week
- Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.☆12,148Updated last month
- A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive vi…☆16,530Updated 3 weeks ago
- Tongyi Deep Research, the Leading Open-source Deep Research Agent☆16,360Updated last week
- MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining☆1,602Updated 4 months ago
- Janus-Series: Unified Multimodal Understanding and Generation Models☆17,588Updated 8 months ago
- DeerFlow is a community-driven Deep Research framework, combining language models with tools like web search, crawling, and Python execut…☆17,704Updated this week
- MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.☆2,932Updated 3 months ago
- A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations☆15,207Updated last week
- Agent S: an open agentic framework that uses computers like a human☆7,693Updated 2 weeks ago