Tencent / POINTS-ReaderLinks
☆45Updated this week
Alternatives and similar repositories for POINTS-Reader
Users that are interested in POINTS-Reader are comparing it to the libraries listed below
Sorting:
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆38Updated last year
- XVERSE-MoE-A36B: A multilingual large language model developed by XVERSE Technology Inc.☆38Updated last year
- ☆29Updated last year
- ☆57Updated last year
- ☆99Updated 8 months ago
- official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"☆154Updated last year
- VimTS: A Unified Video and Image Text Spotter☆79Updated 10 months ago
- Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.☆249Updated last month
- GLM Series Edge Models☆149Updated 3 months ago
- Fused Qwen3 MoE layer for faster training, compatible with HF Transformers, LoRA, 4-bit quant, Unsloth☆171Updated last week
- [ICCV2025] A Token-level Text Image Foundation Model for Document Understanding☆115Updated 2 weeks ago
- Chinese CLIP models with SOTA performance.☆57Updated 2 years ago
- MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)☆141Updated 7 months ago
- A dedicated Colab notebooks to experiment (Nanonets OCR, Monkey OCR, OCRFlux 3B, Typhoo OCR 3B & more..) On T4 GPU - free tier☆20Updated last month
- 研究GOT-OCR-项目落地加速,不限语言☆61Updated 10 months ago
- Our 2nd-gen LMM☆34Updated last year
- ☆177Updated 7 months ago
- The official repository of the dots.vlm1 instruct models proposed by rednote-hilab.☆244Updated 2 weeks ago
- Exploration of the multi modal fuyu-8b model of Adept. 🤓 🔍☆27Updated last year
- [CVPR 2025] Docopilot: Improving Multimodal Models for Document-Level Understanding☆32Updated last month
- ☆39Updated last month
- xllamacpp - a Python wrapper of llama.cpp☆54Updated 2 weeks ago
- Deep Reasoning Translation (DRT) Project☆230Updated last week
- ☆14Updated last year
- 用于学习GOT/Qwen/OnnxLLm☆53Updated 11 months ago
- ☆137Updated 3 weeks ago
- My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"☆73Updated this week
- ☆79Updated last year
- A new novel multi-modality (Vision) RAG architecture☆29Updated 11 months ago
- Vary-tiny codebase upon LAVIS (for training from scratch)and a PDF image-text pairs data (about 600k including English/Chinese)☆86Updated 11 months ago