huggingface / docmatix
A huge dataset for Document Visual Question Answering
☆13Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for docmatix
- MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingua…☆45Updated last month
- ☆45Updated last year
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆41Updated 4 months ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆32Updated 4 months ago
- ☆36Updated last year
- ☆29Updated last year
- Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations☆55Updated 3 months ago
- ☆32Updated 2 years ago
- MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria☆54Updated 3 weeks ago
- The proposed simulated dataset consisting of 9,536 charts and associated data annotations in CSV format.☆21Updated 8 months ago
- The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…☆52Updated last month
- Touchstone: Evaluating Vision-Language Models by Language Models☆77Updated 9 months ago
- ☆57Updated 9 months ago
- imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video…☆27Updated 4 months ago
- The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".☆32Updated last year
- ☆15Updated 2 years ago
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆31Updated 4 months ago
- Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context☆16Updated 2 months ago
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆56Updated last year
- Official repository for the General Robust Image Task (GRIT) Benchmark☆50Updated last year
- VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)☆22Updated 4 months ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆57Updated 5 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆38Updated 4 months ago
- [ICPRAI 2024] DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents☆16Updated 7 months ago
- ☆19Updated last month
- Code for the paper titled "CiT Curation in Training for Effective Vision-Language Data".☆78Updated last year
- [NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…☆98Updated 3 weeks ago
- ☆20Updated 11 months ago
- This repo contains the code and data for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks"☆62Updated this week
- [NeurIPS2024] Official code for (IMA) Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs☆13Updated 3 weeks ago