Tencent-Hunyuan / HunyuanOCRLinks
☆1,288Updated 2 weeks ago
Alternatives and similar repositories for HunyuanOCR
Users that are interested in HunyuanOCR are comparing it to the libraries listed below
Sorting:
- ☆805Updated 2 months ago
- ☆654Updated last month
- Cook up amazing multimodal AI applications effortlessly with MiniCPM-o☆230Updated last week
- GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning☆2,061Updated this week
- An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)☆1,817Updated 3 months ago
- [EMNLP 2025] ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents☆617Updated 6 months ago
- ☆188Updated 2 weeks ago
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆1,284Updated last week
- Seed-Coder is a family of lightweight open-source code LLMs comprising base, instruct and reasoning models, developed by ByteDance Seed.☆709Updated 6 months ago
- ZeroSearch: Incentivize the Search Capability of LLMs without Searching☆1,206Updated 4 months ago
- The official repository of the dots.vlm1 instruct models proposed by rednote-hilab.☆276Updated 2 months ago
- ☆305Updated last month
- Official Python toolkit for the Qwen3-ASR API. Parallel high‑throughput calls, robust long‑audio transcription, multi‑sample‑rate support…☆723Updated 2 months ago
- Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving stat…☆1,516Updated 6 months ago
- Qwen DianJin: LLMs for the Financial Industry by Alibaba Cloud(通义点金:阿里云金融大模型)☆395Updated 2 weeks ago
- [NAACL 2024] Visually Guided Generative Text-Layout Pre-training for Document Intelligence☆149Updated last year
- SmolDocling OCR App built using SmolDocling 256M Model and Streamlit.☆170Updated 8 months ago
- ☆133Updated 8 months ago
- The official repository of the dots.llm1 base and instruct models proposed by rednote-hilab.☆474Updated 4 months ago
- ☆1,650Updated 2 months ago
- A quick vibe coded app for deepseek OCR☆1,520Updated last month
- A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.☆1,421Updated 3 months ago
- ☆986Updated 8 months ago
- ☆1,164Updated 2 months ago
- Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities☆1,129Updated 5 months ago
- [CVPR 2025] This is an official inference code of the paper "BizGen: Advancing Article-level Visual Text Rendering for Infographics Gener…☆294Updated 8 months ago
- Video generation via code☆1,376Updated 3 weeks ago
- an open high-performance Optical Character Recognition (OCR) toolkit☆305Updated 4 months ago
- ☆241Updated 10 months ago
- OCRFlux is a lightweight yet powerful multimodal toolkit that significantly advances PDF-to-Markdown conversion, excelling in complex lay…☆2,413Updated 4 months ago