Vision-CAIR / dochaystacks
☆16Updated last month
Alternatives and similar repositories for dochaystacks:
Users that are interested in dochaystacks are comparing it to the libraries listed below
- A Self-Training Framework for Vision-Language Reasoning☆66Updated last month
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆78Updated 8 months ago
- ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆24Updated 2 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆48Updated 2 months ago
- HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models☆38Updated 3 months ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆63Updated 9 months ago
- ☆66Updated last month
- An Easy-to-use Hallucination Detection Framework for LLMs.☆57Updated 10 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆91Updated this week
- Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations☆67Updated 7 months ago
- Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …☆60Updated 3 months ago
- Code & Dataset for Paper: "Distill Visual Chart Reasoning Ability from LLMs to MLLMs"☆51Updated 4 months ago
- Large Language Models Can Self-Improve in Long-context Reasoning☆62Updated 3 months ago
- [NeurIPS DB Track, 2024] MATH-Vision dataset and code to measure multimodal mathematical reasoning capabilities.☆87Updated this week
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆49Updated 4 months ago
- MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models☆24Updated last month
- ☆45Updated 4 months ago
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆33Updated 8 months ago
- The proposed simulated dataset consisting of 9,536 charts and associated data annotations in CSV format.☆22Updated last year
- [NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs☆97Updated 2 months ago
- ☆73Updated 11 months ago
- Official Code of IdealGPT☆34Updated last year
- ☆49Updated last year
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆41Updated this week
- ☆62Updated 8 months ago
- ☆17Updated last year
- Improving Language Understanding from Screenshots. Paper: https://arxiv.org/abs/2402.14073☆27Updated 7 months ago
- ☆35Updated last month
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆59Updated 7 months ago
- [NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…☆112Updated 3 months ago