☆1,538Jan 13, 2026Updated last month
Alternatives and similar repositories for HunyuanOCR
Users that are interested in HunyuanOCR are comparing it to the libraries listed below
Sorting:
- Official implementation of "VideoMaMa: Mask-Guided Video Matting via Generative Prior", CVPR 2026☆279Feb 7, 2026Updated 3 weeks ago
- Scaling Zero-Shot Reference-to-Video Generation☆62Dec 11, 2025Updated 2 months ago
- [CVPR2026] Detect Anything via Next Point Prediction☆1,163Feb 22, 2026Updated last week
- Using OpenVINO to speed up inference of PaddleOCR-VL model☆25Updated this week
- [ICLR 2026] OCR-Reasoning Benchmark: Unveiling the True Capabilities of MLLMs in Complex Text-Rich Image Reasoning☆73Dec 17, 2025Updated 2 months ago
- The official repo for the DanQing dataset.☆30Jan 16, 2026Updated last month
- MARSHAL: Incentivizing Multi-Agent Reasoning via Self-Play with Strategic LLMs☆38Feb 19, 2026Updated 2 weeks ago
- ☆198Dec 7, 2025Updated 2 months ago
- Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆18,386Jan 30, 2026Updated last month
- Toolkit for linearizing PDFs for LLM datasets/training☆16,947Feb 19, 2026Updated 2 weeks ago
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model☆8,089Feb 10, 2025Updated last year
- ☆370Jul 25, 2025Updated 7 months ago
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 8 months ago
- Contexts Optical Compression☆22,596Jan 27, 2026Updated last month
- AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents☆37Oct 7, 2025Updated 4 months ago
- T2I-Copilot: A Training-Free Multi-Agent Text-to-Image System for Enhanced Prompt Interpretation and Interactive Generation (ICCV'25)☆44Oct 6, 2025Updated 4 months ago
- Official Repository of paper: "MotionEdit: Benchmarking and Learning Motion-Centric Image Editing"☆59Jan 20, 2026Updated last month
- [ICCV 2025] Dynamic-VLM☆28Dec 16, 2024Updated last year
- ☆109Sep 3, 2025Updated 6 months ago
- [CVPR 2026] 👋 Dataset and Benchmark code for EgoEdit☆107Feb 21, 2026Updated last week
- Multilingual Document Layout Parsing in a Single Vision-Language Model☆7,876Feb 15, 2026Updated 2 weeks ago
- Scripts for converting various datasets to MSCOCO annotation (json) files☆12Jun 5, 2019Updated 6 years ago
- ☆49Feb 9, 2026Updated 3 weeks ago
- A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Multimodal Live Streaming on Your Phone☆23,942Feb 23, 2026Updated last week
- FIBO is a SOTA, first open-source, JSON-native text-to-image model built for controllable, predictable, and legally safe image generation…☆304Jan 7, 2026Updated last month
- OCRFlux is a lightweight yet powerful multimodal toolkit that significantly advances PDF-to-Markdown conversion, excelling in complex lay…☆2,484Aug 4, 2025Updated 7 months ago
- ☆553Feb 26, 2026Updated last week
- A real-time streaming conversational video system that transforms text interactions into continuous, high-fidelity video responses using …☆307Dec 15, 2025Updated 2 months ago
- AceForge is a local-first AI music workstation for Apple/OSX based on Ace-Step, DeMucs, XTTSv2☆63Feb 11, 2026Updated 3 weeks ago
- Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.☆55,275Updated this week
- ☆883Feb 13, 2026Updated 2 weeks ago
- ☆39Dec 4, 2023Updated 2 years ago
- Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/…☆71,369Updated this week
- A Knowledge-grounded framework for Autonomous ML/AI Program Synthesis and Optimization☆78Feb 20, 2026Updated last week
- The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.☆8,852Dec 17, 2025Updated 2 months ago
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆1,529Updated this week
- Research code artifacts for Code World Model (CWM) including inference tools, reproducibility, and documentation.☆846Dec 26, 2025Updated 2 months ago
- ☆2,499Jul 16, 2025Updated 7 months ago
- HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation☆1,208Oct 15, 2025Updated 4 months ago