Vary-tiny codebase upon LAVIS (for training from scratch)and a PDF image-text pairs data (about 600k including English/Chinese)
☆86Sep 21, 2024Updated last year
Alternatives and similar repositories for Vary-tiny-600k
Users that are interested in Vary-tiny-600k are comparing it to the libraries listed below
Sorting:
- ☆57Jan 23, 2024Updated 2 years ago
- [ACM'MM 2024 Oral] Official code for "OneChart: Purify the Chart Structural Extraction via One Auxiliary Token"☆260Apr 14, 2025Updated 10 months ago
- official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"☆194May 31, 2024Updated last year
- Official code implementation of Vary-toy (Small Language Model Meets with Reinforced Vision Vocabulary)☆630Dec 30, 2024Updated last year
- [ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.☆1,897Dec 30, 2024Updated last year
- Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step☆159Jul 28, 2025Updated 7 months ago
- ☆142Feb 13, 2024Updated 2 years ago
- SPRINT: Script-agnostic Structure Recognition in Tables☆16Mar 26, 2025Updated 11 months ago
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model☆8,092Feb 10, 2025Updated last year
- Compute benchmark of table structure recognition.☆28Dec 2, 2025Updated 3 months ago
- 研究GOT-OCR-项目落地加速,不限语言☆62Oct 24, 2024Updated last year
- LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs☆415Dec 20, 2025Updated 2 months ago
- POM: Occupancy map estimation for people detection☆10Aug 5, 2014Updated 11 years ago
- On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)☆797Jul 5, 2025Updated 8 months ago
- PDF Parsing Tool: GOT's vLLM acceleration implementation, MinerU for layout recognition, and GOT for table formula parsing.☆65Nov 7, 2024Updated last year
- The official code for "OG-HFYOLO :Orientation Gradient Guidance and Heterogeneous Feature Fusion For Deformation Table Cell Instance Segm…☆13Jul 28, 2025Updated 7 months ago
- Keypoint dataset for airplane☆10Dec 28, 2019Updated 6 years ago
- DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models☆152Jan 13, 2025Updated last year
- [ICLR 2025] Mathematical Visual Instruction Tuning for Multi-modal Large Language Models☆153Dec 5, 2024Updated last year
- Code and data for the paper: DTSM: Toward Dense Table Structure Recognition with Text Query Encoder and Adjacent Feature Aggregator☆12Apr 28, 2024Updated last year
- GOT-OCR的GUI版本,提供OCR、导出PDF、批处理等功能,但不提供训练功能☆182Nov 11, 2025Updated 3 months ago
- ☆13May 9, 2023Updated 2 years ago
- MLLM @ Game☆16May 12, 2025Updated 9 months ago
- ☆19Sep 11, 2024Updated last year
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆39Jun 20, 2024Updated last year
- This repo is used to release the ArxivFormula dataset.☆35Nov 12, 2024Updated last year
- A High-efficiency Open-source Toolkit for Table-to-Latex Task☆275Dec 6, 2025Updated 3 months ago
- The source code of our ACL paper "A Training-free and Reference-free Summarization Evaluation Metric via Centrality-weighted Relevance an…☆14May 6, 2023Updated 2 years ago
- [CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness☆447May 14, 2025Updated 9 months ago
- Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models (CVPR 2024 Highlight)☆1,948Jan 24, 2026Updated last month
- Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed☆109Oct 25, 2024Updated last year
- [EMNLP 2023] Once Upon a *Time* in *Graph*: Relative-Time Pretraining for Complex Temporal Reasoning☆17Oct 31, 2023Updated 2 years ago
- torch TH/THC c++11 wrapper☆14Jun 14, 2017Updated 8 years ago
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆38Sep 9, 2024Updated last year
- Code for EMNLP-2018 paper "Variational Autoregressive Decoder for Neural Response Generation"☆16Oct 11, 2019Updated 6 years ago
- A pytorch pretrained model of MnasNet☆21Dec 3, 2019Updated 6 years ago
- [ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text☆413May 5, 2025Updated 10 months ago
- UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition☆458Sep 28, 2025Updated 5 months ago
- ☆20Apr 3, 2023Updated 2 years ago