anniedoris / design_qaLinks
☆54Updated 2 months ago
Alternatives and similar repositories for design_qa
Users that are interested in design_qa are comparing it to the libraries listed below
Sorting:
- Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents, CVPR 2025☆25Updated 9 months ago
- ACL 2025: Synthetic data generation pipelines for text-rich images.☆145Updated 8 months ago
- ☆50Updated 5 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆53Updated 10 months ago
- [IEEE VIS 2024] LLaVA-Chart: Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruc…☆72Updated 9 months ago
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆296Updated 2 weeks ago
- Python Library to evaluate VLM models' robustness across diverse benchmarks☆219Updated 2 weeks ago
- [ACL 2024] ChartAssistant is a chart-based vision-language model for universal chart comprehension and reasoning.☆130Updated last year
- Geometric-Mean Policy Optimization☆89Updated 3 weeks ago
- [NeurIPS 2025 Spotlight] Official repository for "Web-Shepherd: Advancing PRMs for Reinforcing Web Agents"☆47Updated 5 months ago
- [NeurIPS 2024] MATH-Vision dataset and code to measure multimodal mathematical reasoning capabilities.☆120Updated 5 months ago
- ☆120Updated last month
- [ICCV 2025 Highlight] The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆174Updated 7 months ago
- Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations☆102Updated last month
- Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step☆150Updated 3 months ago
- ☆221Updated 8 months ago
- (ICCV 2025) OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation☆93Updated 4 months ago
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆222Updated this week
- [NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs☆128Updated 6 months ago
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆132Updated last year
- A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.☆96Updated 10 months ago
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆143Updated 7 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆115Updated 3 months ago
- The official repository of "R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Integration"☆120Updated 2 months ago
- ☆68Updated last year
- DocBench: A Benchmark for Evaluating LLM-based Document Reading Systems☆53Updated last year
- This repository is maintained to release dataset and models for multimodal puzzle reasoning.☆108Updated 8 months ago
- ☆74Updated last year
- a family of highly capabale yet efficient large multimodal models☆191Updated last year
- Official implementation of paper "On the Diagram of Thought" (https://arxiv.org/abs/2409.10038)☆187Updated 2 months ago