Tencent-Hunyuan / HunyuanOCRLinks
☆1,427Updated last week
Alternatives and similar repositories for HunyuanOCR
Users that are interested in HunyuanOCR are comparing it to the libraries listed below
Sorting:
- ☆817Updated 2 months ago
- An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)☆1,831Updated 4 months ago
- GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning☆2,112Updated 3 weeks ago
- Cook up amazing multimodal AI applications effortlessly with MiniCPM-o☆231Updated last month
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆1,342Updated 3 weeks ago
- ☆318Updated 2 months ago
- ☆671Updated last week
- Qwen-Image-Layered: Layered Decomposition for Inherent Editablity☆1,375Updated last week
- OCRFlux is a lightweight yet powerful multimodal toolkit that significantly advances PDF-to-Markdown conversion, excelling in complex lay…☆2,414Updated 5 months ago
- ☆282Updated last week
- A quick vibe coded app for deepseek OCR☆1,543Updated last month
- ☆385Updated this week
- MAI-UI: Real-World Centric Foundation GUI Agents.☆1,296Updated this week
- ☆191Updated last month
- an open high-performance Optical Character Recognition (OCR) toolkit☆305Updated 5 months ago
- Implementation of "Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length"☆1,359Updated last week
- Official Python toolkit for the Qwen3-ASR API. Parallel high‑throughput calls, robust long‑audio transcription, multi‑sample‑rate support…☆738Updated 2 months ago
- Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving stat…☆1,515Updated 6 months ago
- Qwen DianJin: LLMs for the Financial Industry by Alibaba Cloud(通义点金:阿里云金融大模型)☆411Updated 3 weeks ago
- Seed-Coder is a family of lightweight open-source code LLMs comprising base, instruct and reasoning models, developed by ByteDance Seed.☆722Updated 7 months ago
- [EMNLP 2025] ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents☆619Updated 7 months ago
- [NAACL 2024] Visually Guided Generative Text-Layout Pre-training for Document Intelligence☆149Updated last year
- Video generation via code☆1,473Updated last month
- A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.☆1,427Updated 3 months ago
- ☆1,696Updated 3 months ago
- GLM-ASR-Nano: A robust, open-source speech recognition model with 1.5B parameters☆659Updated last week
- Multilingual Document Layout Parsing in a Single Vision-Language Model☆6,068Updated 2 weeks ago
- A Docker-powered service for PDF document layout analysis. This service provides a powerful and flexible PDF analysis service. The servic…☆1,059Updated this week
- UltraRAG v2: A Low-Code MCP Framework for Building Complex and Innovative RAG Pipelines☆2,414Updated this week
- A framework for efficient model inference with omni-modality models☆1,977Updated last week