zai-org / GLM-OCRLinks
GLM-OCR: Accurate × Fast × Comprehensive
☆505Updated this week
Alternatives and similar repositories for GLM-OCR
Users that are interested in GLM-OCR are comparing it to the libraries listed below
Sorting:
- The official repository of the dots.vlm1 instruct models proposed by rednote-hilab.☆283Updated 4 months ago
- ☆194Updated last month
- official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"☆195Updated last year
- ☆1,515Updated 3 weeks ago
- ☆925Updated 2 weeks ago
- Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.☆269Updated 2 weeks ago
- ☆187Updated 11 months ago
- [ICCV2025] A Token-level Text Image Foundation Model for Document Understanding☆129Updated 5 months ago
- [NAACL 2024] Visually Guided Generative Text-Layout Pre-training for Document Intelligence☆149Updated last year
- A reproduction of the Deepseek-OCR model including training☆206Updated 2 months ago
- Cook up amazing multimodal AI applications effortlessly with MiniCPM-o☆242Updated last month
- ☆101Updated last year
- Vary-tiny codebase upon LAVIS (for training from scratch)and a PDF image-text pairs data (about 600k including English/Chinese)☆86Updated last year
- ☆57Updated 2 years ago
- ☆142Updated last year
- [ACM'MM 2024 Oral] Official code for "OneChart: Purify the Chart Structural Extraction via One Auxiliary Token"☆259Updated 9 months ago
- [ACL 2025 Oral] 🔥🔥 MegaPairs: Massive Data Synthesis for Universal Multimodal Retrieval☆241Updated 2 months ago
- PDF Parsing Tool: GOT's vLLM acceleration implementation, MinerU for layout recognition, and GOT for table formula parsing.☆65Updated last year
- The official code for NeurIPS 2024 paper: Harmonizing Visual Text Comprehension and Generation☆129Updated last year
- ☆869Updated 3 months ago
- GLM Series Edge Models☆157Updated 7 months ago
- Qwen DianJin: LLMs for the Financial Industry by Alibaba Cloud(通义点金:阿里云金融大模型)☆420Updated last week
- ☆47Updated 11 months ago
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆65Updated last year
- Step3-VL-10B: A compact yet frontier multimodal model achieving SOTA performance at the 10B scale, matching open-source models 10-20x its…☆378Updated 2 weeks ago
- ☆187Updated last year
- Visual Causal Flow☆1,306Updated last week
- [AAAI2025 Oral] Predicting the Original Appearance of Damaged Historical Documents☆101Updated 6 months ago
- Youtu-Embedding is an industry-leading, general-purpose text representation model developed by Tencent Youtu Lab.☆174Updated 2 months ago
- ☆326Updated 2 months ago