Tencent-Hunyuan/HunyuanOCR

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Tencent-Hunyuan/HunyuanOCR)

Tencent-Hunyuan / HunyuanOCR

☆1,861

Alternatives and similar repositories for HunyuanOCR

Users that are interested in HunyuanOCR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

deepseek-ai / DeepSeek-OCR
View on GitHub
Contexts Optical Compression
☆23,615Jan 27, 2026Updated 5 months ago
studio-dots-ai / dots.ocr
View on GitHub
Multilingual Document Layout Parsing in a Single Vision-Language Model
☆9,016Mar 24, 2026Updated 3 months ago
opendatalab / OmniDocBench
View on GitHub
[CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation
☆1,898Jun 26, 2026Updated 3 weeks ago
deepseek-ai / DeepSeek-OCR-2
View on GitHub
Visual Causal Flow
☆3,152Feb 3, 2026Updated 5 months ago
PaddlePaddle / PaddleOCR
View on GitHub
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/…
☆85,885Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Yuliang-Liu / MonkeyOCR
View on GitHub
A lightweight LMM-based Document Parsing Model
☆6,604Updated this week
Topdu / OpenOCR
View on GitHub
OpenOCR: An Open-Source Toolkit for General-OCR Research and Applications, integrates a unified training and evaluation benchmark, commer…
☆1,415May 20, 2026Updated 2 months ago
QwenLM / Qwen3-VL
View on GitHub
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
☆19,630Jan 30, 2026Updated 5 months ago
bytedance / Dolphin
View on GitHub
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
☆9,036Mar 25, 2026Updated 3 months ago
alibaba / Logics-Parsing
View on GitHub
☆1,393May 13, 2026Updated 2 months ago
chatdoc-com / OCRFlux
View on GitHub
OCRFlux is a lightweight yet powerful multimodal toolkit that significantly advances PDF-to-Markdown conversion, excelling in complex lay…
☆2,524Apr 14, 2026Updated 3 months ago
opendatalab / MinerU
View on GitHub
Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.
☆75,212Updated this week
zai-org / GLM-OCR
View on GitHub
GLM-OCR: Accurate × Fast × Comprehensive
☆7,177Apr 21, 2026Updated 2 months ago
Tencent-Hunyuan / HunyuanImage-3.0
View on GitHub
HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation
☆3,190Jun 23, 2026Updated 3 weeks ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
SCUT-DLVCLab / OCR-Reasoning
View on GitHub
[ICLR 2026] OCR-Reasoning Benchmark: Unveiling the True Capabilities of MLLMs in Complex Text-Rich Image Reasoning
☆76May 26, 2026Updated last month
Ucas-HaoranWei / GOT-OCR2.0
View on GitHub
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
☆8,154Feb 10, 2025Updated last year
FireRedTeam / FireRed-OCR
View on GitHub
☆289Mar 4, 2026Updated 4 months ago
Tencent / POINTS-Reader
View on GitHub
☆197Dec 7, 2025Updated 7 months ago
RapidAI / RapidOCR
View on GitHub
📄 Awesome OCR multiple programing languages toolkits based on ONNX Runtime, OpenVINO, MNN, PaddlePaddle, TensorRT and PyTorch.
☆7,214Jul 9, 2026Updated last week
studio-dots-ai / dots.mocr
View on GitHub
Multimodal OCR: Parse Anything from Documents
☆302Mar 20, 2026Updated 4 months ago
opendatalab / DocLayout-YOLO
View on GitHub
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
☆2,232Apr 14, 2025Updated last year
Tencent-Hunyuan / Hy-MT
View on GitHub
☆796Jun 1, 2026Updated last month
Tencent-Hunyuan / Hunyuan-MT
View on GitHub
☆712Dec 30, 2025Updated 6 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
Tongyi-MAI / Z-Image
View on GitHub
☆11,775Feb 9, 2026Updated 5 months ago
ATH-MaaS / Ovis-Image
View on GitHub
Ovis-Image is a 7B text-to-image model specifically optimized for high-quality text rendering, designed to operate efficiently under stri…
☆318May 15, 2026Updated 2 months ago
datalab-to / chandra
View on GitHub
OCR model that handles complex tables, forms, handwriting with full layout.
☆11,695Jun 26, 2026Updated 3 weeks ago
QwenLM / Qwen-Image
View on GitHub
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
☆8,132Feb 10, 2026Updated 5 months ago
allenai / olmocr
View on GitHub
Toolkit for linearizing PDFs for LLM datasets/training
☆19,134Mar 25, 2026Updated 3 months ago
zai-org / GLM-V
View on GitHub
GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
☆2,356Updated this week
NanoNets / docext
View on GitHub
An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)
☆2,032Mar 17, 2026Updated 4 months ago
Alibaba-NLP / DeepResearch
View on GitHub
Tongyi Deep Research, the Leading Open-source Deep Research Agent
☆19,691Feb 27, 2026Updated 4 months ago
baidu / Unlimited-OCR
View on GitHub
Unlimited OCR Works: Welcome the Era of One-shot Long-horizon Parsing.
☆14,678Jul 3, 2026Updated 2 weeks ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
QwenLM / Qwen3-Omni
View on GitHub
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, im…
☆3,900Apr 23, 2026Updated 2 months ago
modelscope / FunASR
View on GitHub
Open-source speech recognition toolkit for training, inference, streaming ASR, VAD, punctuation, speaker diarization pipelines, and OpenA…
☆19,364Updated this week
opendatalab / mineru-vl-utils
View on GitHub
A Python package for interacting with the MinerU Vision-Language Model.
☆136Jun 11, 2026Updated last month
IDEA-Research / Rex-Omni
View on GitHub
[CVPR2026] Detect Anything via Next Point Prediction
☆1,507Feb 22, 2026Updated 4 months ago
Tencent / WeKnora
View on GitHub
Open-source LLM knowledge platform: turn raw documents into a queryable RAG, an autonomous reasoning agent, and a self-maintaining Wiki.
☆18,579Updated this week
opendatalab / UniMERNet
View on GitHub
UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition
☆492Sep 28, 2025Updated 9 months ago
OpenGVLab / InternVL
View on GitHub
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
☆10,098Sep 22, 2025Updated 9 months ago