huggingface / docmatixLinks

A huge dataset for Document Visual Question Answering

☆20

Alternatives and similar repositories for docmatix

Users that are interested in docmatix are comparing it to the libraries listed below

Sorting:

Alpha-Innovator / SimChart9K
The proposed simulated dataset consisting of 9,536 charts and associated data annotations in CSV format.
☆26Updated last year
opendatalab / MLLM-DataEngine
MLLM-DataEngine: An Iterative Refinement Approach for MLLM
☆48Updated last year
palchenli / VL-Instruction-Tuning
☆91Updated last year
X2FD / LVIS-INSTRUCT4V
☆133Updated last year
yuecao0119 / MMInstruct
[SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…
☆59Updated 11 months ago
mynameischaos / Lion
Lion: Kindling Vision Intelligence within Large Language Models
☆51Updated last year
scenarios / WeMM
☆87Updated last year
TencentARC / GVT
Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
☆58Updated 2 years ago
OpenGVLab / MM-NIAH
[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…
☆115Updated 11 months ago
BAAI-DCAI / DataOptim
A collection of visual instruction tuning datasets.
☆76Updated last year
baaivision / CapsFusion
[CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale
☆211Updated last year
OFA-Sys / TouchStone
Touchstone: Evaluating Vision-Language Models by Language Models
☆83Updated last year
thunlp / Muffin
☆66Updated last year
ParadoxZW / LLaVA-UHD-Better
A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo
☆34Updated last year
FreedomIntelligence / ALLaVA
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
☆275Updated last year
bytedance / MTVQA
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingua…
☆63Updated 5 months ago
alibaba / conv-llava
☆119Updated last year
BAAI-DCAI / Visual-Instruction-Tuning
SVIT: Scaling up Visual Instruction Tuning
☆163Updated last year
MAmmoTH-VL / MAmmoTH-VL
(ACL 2025) MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
☆48Updated 4 months ago
yuyq96 / TextHawk
Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
☆63Updated 11 months ago
OpenGVLab / V2PE
[ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
☆57Updated 10 months ago
vlf-silkie / VLFeedback
☆100Updated last year
microsoft / UniTAB
UniTAB: Unifying Text and Box Outputs for Grounded VL Modeling, ECCV 2022 (Oral Presentation)
☆89Updated 2 years ago
Victorwz / MLM_Filter
Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".
☆67Updated 6 months ago
Share14 / ShareGemini
☆31Updated last year
RUCAIBox / ComVint
The official GitHub page for ''What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Ins…
☆19Updated last year
facebookresearch / diht
Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training
☆138Updated 2 years ago
FudanNLPLAB / MouSi
☆74Updated last year
zwq2018 / Multi-modal-Self-instruct
The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…
☆83Updated 9 months ago
FuxiaoLiu / MMC
[NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning
☆96Updated 9 months ago