anniedoris / design_qaLinks
☆55Updated 3 months ago
Alternatives and similar repositories for design_qa
Users that are interested in design_qa are comparing it to the libraries listed below
Sorting:
- Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents, CVPR 2025☆25Updated 10 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆53Updated 11 months ago
- ACL 2025: Synthetic data generation pipelines for text-rich images.☆148Updated 8 months ago
- [IEEE VIS 2024] LLaVA-Chart: Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruc…☆72Updated 10 months ago
- (ICCV 2025) OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation☆94Updated 4 months ago
- [CVPR2025] VDocRAG: Retirval-Augmented Generation over Visually-Rich Documents☆50Updated 6 months ago
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆300Updated 2 weeks ago
- [ACL 2024] ChartAssistant is a chart-based vision-language model for universal chart comprehension and reasoning.☆131Updated last year
- [ICCV 2025 Highlight] The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆178Updated 8 months ago
- ☆69Updated last year
- Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations☆106Updated 2 months ago
- [NeurIPS 2025] Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation, arXiv 2024☆64Updated last month
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆121Updated 4 months ago
- ☆80Updated last year
- Python Library to evaluate VLM models' robustness across diverse benchmarks☆219Updated last month
- ☆75Updated last year
- A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.☆96Updated 11 months ago
- Dataset introduced in PlotQA: Reasoning over Scientific Plots☆82Updated 2 years ago
- [Technical Report] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with …☆61Updated last year
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆101Updated 6 months ago
- ☆52Updated 6 months ago
- This repository is maintained to release dataset and models for multimodal puzzle reasoning.☆111Updated 9 months ago
- ☆176Updated 4 months ago
- Code release for "SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers" [NeurIPS D&B, 2024]☆70Updated 10 months ago
- DocBench: A Benchmark for Evaluating LLM-based Document Reading Systems☆57Updated last year
- ☆56Updated last year
- ☆27Updated last year
- [EMNLP 2025] Official codebase for Rearank: Reasoning Re-ranking Agent☆28Updated 3 months ago
- [NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs☆132Updated 7 months ago
- Geometric-Mean Policy Optimization☆94Updated last week