Ucas-HaoranWei / Vary-tiny-600k
Vary-tiny codebase upon LAVIS (for training from scratch)and a PDF image-text pairs data (about 600k including English/Chinese)
☆68Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for Vary-tiny-600k
- ☆126Updated 9 months ago
- Datasets and Evaluation Scripts for CompHRDoc☆25Updated 7 months ago
- official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"☆128Updated 5 months ago
- ☆55Updated 9 months ago
- ☆156Updated 8 months ago
- Evaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)☆121Updated last year
- ☆67Updated this week
- Document Artifical Intelligence☆130Updated last month
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆51Updated 2 weeks ago
- MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingua…☆45Updated last month
- 【ArXiv】PDF-Wukong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling☆98Updated last month
- ☆106Updated 9 months ago
- Code for ICCV 2023 Paper : “ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction”☆50Updated last year
- The official code for NeurIPS 2024 paper: Harmonizing Visual Text Comprehension and Generation☆73Updated this week
- ICDAR 2024 Table OCR Model☆19Updated last month
- [NAACL 2024] Visually Guided Generative Text-Layout Pre-training for Document Intelligence☆49Updated 2 months ago
- [ACM'MM 2024 Oral] Official code for "OneChart: Purify the Chart Structural Extraction via One Auxiliary Token"☆197Updated last month
- This repo is used to release the ArxivFormula dataset.☆24Updated last week
- An open-source implementaion for fine-tuning Qwen2-VL series by Alibaba Cloud.☆113Updated 2 weeks ago
- ☆201Updated 3 weeks ago
- The official PyTorch implementation of SEMv3.☆27Updated 5 months ago
- An unofficial Pytorch implementation of ERNIE-Layout which is originally released through PaddleNLP.☆99Updated last year
- WikiTableSet: A largest publicly available image-based table recognition dataset in three languages built from Wikipedia☆25Updated last year
- InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions (AAAI2024)☆145Updated 5 months ago
- Contrast-guided Feature Adjustment Module for Visual Information Extraction☆28Updated last year
- Dataset and Code for our ACL 2024 paper: "Multimodal Table Understanding". We propose the first large-scale Multimodal IFT and Pre-Train …☆164Updated last month
- My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"☆68Updated last week
- RoDLA: Benchmarking the Robustness of Document Layout Analysis Models☆28Updated 7 months ago
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆36Updated 2 months ago
- ☆50Updated 5 months ago