Tencent-Hunyuan / HunyuanOCRLinks
☆1,523Updated last month
Alternatives and similar repositories for HunyuanOCR
Users that are interested in HunyuanOCR are comparing it to the libraries listed below
Sorting:
- Visual Causal Flow☆2,220Updated last week
- GLM-OCR: Accurate × Fast × Comprehensive☆996Updated this week
- ☆876Updated this week
- MAI-UI: Real-World Centric Foundation GUI Agents ranging from 2B to 235B☆1,626Updated 2 weeks ago
- GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning☆2,162Updated 2 weeks ago
- Cook up amazing multimodal AI applications effortlessly with MiniCPM-o☆290Updated last week
- An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)☆1,851Updated 5 months ago
- ☆985Updated last week
- Qwen-Image-Layered: Layered Decomposition for Inherent Editablity☆1,540Updated last month
- ☆474Updated last month
- ☆684Updated last month
- ☆330Updated last week
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆1,479Updated last month
- A quick vibe coded app for deepseek OCR☆1,720Updated 2 months ago
- Official Python toolkit for the Qwen3-ASR API. Parallel high‑throughput calls, robust long‑audio transcription, multi‑sample‑rate support…☆819Updated last week
- Seed-Coder is a family of lightweight open-source code LLMs comprising base, instruct and reasoning models, developed by ByteDance Seed.☆741Updated 8 months ago
- The official repository of the dots.llm1 base and instruct models proposed by rednote-hilab.☆488Updated 5 months ago
- Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving stat…☆1,539Updated 7 months ago
- ☆1,773Updated 4 months ago
- OCRFlux is a lightweight yet powerful multimodal toolkit that significantly advances PDF-to-Markdown conversion, excelling in complex lay…☆2,480Updated 6 months ago
- Youtu-Tip: Tap for Intelligence, Keep on Device.☆560Updated 2 weeks ago
- [EMNLP 2025] ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents☆626Updated last month
- Qwen DianJin: LLMs for the Financial Industry by Alibaba Cloud(通义点金:阿里云金融大模型)☆420Updated last week
- ☆196Updated 2 months ago
- ZeroSearch: Incentivize the Search Capability of LLMs without Searching☆1,240Updated 5 months ago
- SmolDocling OCR App built using SmolDocling 256M Model and Streamlit.☆234Updated 10 months ago
- Out-of-the-box DeepSeek OCR document parsing Web Studio☆542Updated 3 months ago
- A real-time Electron-based desktop GUI for DeepSeek-OCR☆738Updated last month
- A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.☆1,430Updated 4 months ago
- GLM-ASR-Nano: A robust, open-source speech recognition model with 1.5B parameters☆732Updated last week