Ucas-HaoranWei / Vary-tiny-600kView external linksLinks
Vary-tiny codebase upon LAVIS (for training from scratch)and a PDF image-text pairs data (about 600k including English/Chinese)
☆86Sep 21, 2024Updated last year
Alternatives and similar repositories for Vary-tiny-600k
Users that are interested in Vary-tiny-600k are comparing it to the libraries listed below
Sorting:
- ☆57Jan 23, 2024Updated 2 years ago
- [ACM'MM 2024 Oral] Official code for "OneChart: Purify the Chart Structural Extraction via One Auxiliary Token"☆259Apr 14, 2025Updated 10 months ago
- official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"☆195May 31, 2024Updated last year
- Official code implementation of Vary-toy (Small Language Model Meets with Reinforced Vision Vocabulary)☆629Dec 30, 2024Updated last year
- Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step☆159Jul 28, 2025Updated 6 months ago
- ☆142Feb 13, 2024Updated 2 years ago
- SPRINT: Script-agnostic Structure Recognition in Tables☆16Mar 26, 2025Updated 10 months ago
- Using Llam.cpp and onnxruntime to accelerate inference of GOT-OCR2.0☆15Mar 6, 2025Updated 11 months ago
- Compute benchmark of table structure recognition.☆28Dec 2, 2025Updated 2 months ago
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model☆23Sep 26, 2024Updated last year
- 研究GOT-OCR-项目落地加速,不限语言☆62Oct 24, 2024Updated last year
- Open-Retrieval Conversational Machine Reading: A new setting & OR-ShARC dataset☆13Nov 19, 2022Updated 3 years ago
- POM: Occupancy map estimation for people detection☆10Aug 5, 2014Updated 11 years ago
- On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)☆793Jul 5, 2025Updated 7 months ago
- PDF Parsing Tool: GOT's vLLM acceleration implementation, MinerU for layout recognition, and GOT for table formula parsing.☆65Nov 7, 2024Updated last year
- The official code for "OG-HFYOLO :Orientation Gradient Guidance and Heterogeneous Feature Fusion For Deformation Table Cell Instance Segm…☆13Jul 28, 2025Updated 6 months ago
- Keypoint dataset for airplane☆10Dec 28, 2019Updated 6 years ago
- DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models☆152Jan 13, 2025Updated last year
- [ICLR 2025] Mathematical Visual Instruction Tuning for Multi-modal Large Language Models☆152Dec 5, 2024Updated last year
- 基于TrOCR + UniMER-1M数据集,训练一个小而美的公式识别模型☆29Jun 23, 2025Updated 7 months ago
- MLLM @ Game☆16May 12, 2025Updated 9 months ago
- ☆19Sep 11, 2024Updated last year
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆39Jun 20, 2024Updated last year
- This repo is used to release the ArxivFormula dataset.☆35Nov 12, 2024Updated last year
- A High-efficiency Open-source Toolkit for Table-to-Latex Task☆275Dec 6, 2025Updated 2 months ago
- Code of the COLING22 paper "uChecker: Masked Pretrained Language Models as Unsupervised Chinese Spelling Checkers"☆19Aug 17, 2022Updated 3 years ago
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆38Sep 9, 2024Updated last year
- [EMNLP 2023] Once Upon a *Time* in *Graph*: Relative-Time Pretraining for Complex Temporal Reasoning☆17Oct 31, 2023Updated 2 years ago
- torch TH/THC c++11 wrapper☆14Jun 14, 2017Updated 8 years ago
- Code for EMNLP-2018 paper "Variational Autoregressive Decoder for Neural Response Generation"☆16Oct 11, 2019Updated 6 years ago
- [ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text☆412May 5, 2025Updated 9 months ago
- A huge dataset for Document Visual Question Answering☆20Jul 29, 2024Updated last year
- ☆19Apr 3, 2023Updated 2 years ago
- Code for the paper "Partially-Aligned Data-to-Text Generation with Distant Supervision" in EMNLP 2020.☆19Jan 15, 2021Updated 5 years ago
- ☆21Sep 17, 2021Updated 4 years ago
- ☆23Jan 8, 2024Updated 2 years ago
- mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding☆2,371May 30, 2025Updated 8 months ago
- Daily tracking of awesome aigc papers, including video generation, video editing, animation.☆24Aug 20, 2025Updated 5 months ago
- ☆48Sep 5, 2024Updated last year