Vision-CAIR / dochaystacksLinks

Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents, CVPR 2025

☆25

Alternatives and similar repositories for dochaystacks

Users that are interested in dochaystacks are comparing it to the libraries listed below

Sorting:

haon-chen / MoCa
☆61Updated 2 months ago
DataArcTech / RagVL
Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …
☆86Updated 11 months ago
LinWeizheDragon / FLMR
The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.
☆100Updated 5 months ago
HZQ950419 / Math-LLaVA
Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models
☆91Updated last year
OpenGVLab / MM-NIAH
[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…
☆115Updated 11 months ago
VisualWebBench / VisualWebBench
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
☆59Updated last year
mayubo2333 / MMLongBench-Doc
Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations
☆101Updated last month
yihedeng9 / OpenVLThinker
OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement
☆116Updated 3 months ago
FudanNLPLAB / MouSi
☆74Updated last year
DAMO-NLP-SG / multimodal_textbook
[ICCV 2025 Highlight] The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"
☆174Updated 7 months ago
Relaxed-System-Lab / multi-actor-data-selection
This is the repo for the paper Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining.
☆45Updated 2 months ago
RifleZhang / LLaVA-Reasoner-DPO
☆98Updated 9 months ago
OpenGVLab / ChartAst
[ACL 2024] ChartAssistant is a chart-based vision-language model for universal chart comprehension and reasoning.
☆130Updated last year
OpenMatch / MARVEL
[ACL 2024 Oral] This is the code repo for our ACL‘24 paper "MARVEL: Unlocking the Multi-Modal Capability of Dense Retrieval via Visual Mo…
☆38Updated last year
SihengLi99 / SEALONG
Large Language Models Can Self-Improve in Long-context Reasoning
☆73Updated 11 months ago
haon-chen / mmE5
☆54Updated 8 months ago
RUCAIBox / Virgo
Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*
☆109Updated 5 months ago
FuxiaoLiu / MMC
[NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning
☆96Updated 9 months ago
yuyq96 / TextHawk
Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
☆63Updated last year
LengSicong / MMR1
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources
☆205Updated last month
FudanDISC / ReForm-Eval
An benchmark for evaluating the capabilities of large vision-language models (LVLMs)
☆45Updated last year
thunlp / Muffin
☆66Updated last year
RhapsodyAILab / MiniCPM-V-Embedding
☆29Updated last year
zwq2018 / Multi-modal-Self-instruct
The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…
☆83Updated 9 months ago
si0wang / ThinkLite-VL
☆101Updated 4 months ago
yhy-2000 / VideoDeepResearch
☆116Updated 2 weeks ago
mathllm / MATH-V
[NeurIPS 2024] MATH-Vision dataset and code to measure multimodal mathematical reasoning capabilities.
☆120Updated 5 months ago
jihaonew / MM-Instruct
MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
☆35Updated last year
hewei2001 / ReachQA
[EMNLP 2025] Distill Visual Chart Reasoning Ability from LLMs to MLLMs
☆56Updated 2 months ago
multimodal-art-projection / COIG-P
☆39Updated 3 months ago