anniedoris / design_qaLinks
☆56Updated 5 months ago
Alternatives and similar repositories for design_qa
Users that are interested in design_qa are comparing it to the libraries listed below
Sorting:
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆53Updated last year
- ACL 2025: Synthetic data generation pipelines for text-rich images.☆152Updated 10 months ago
- Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents, CVPR 2025☆25Updated 11 months ago
- A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.☆97Updated last year
- [EMNLP 2025] Official codebase for Rearank: Reasoning Re-ranking Agent☆32Updated 4 months ago
- (ICCV 2025) OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation☆95Updated last month
- ☆56Updated last year
- Python Library to evaluate VLM models' robustness across diverse benchmarks☆220Updated 2 months ago
- Parameter-efficient finetuning script for Phi-3-vision, the strong multimodal language model by Microsoft.☆58Updated last year
- [NeurIPS 2025 Spotlight] Official repository for "Web-Shepherd: Advancing PRMs for Reinforcing Web Agents"☆51Updated 7 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆126Updated 5 months ago
- InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions (AAAI2024)☆159Updated last year
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆340Updated 3 weeks ago
- ☆69Updated last year
- This is the repo for the paper "PANGEA: A FULLY OPEN MULTILINGUAL MULTIMODAL LLM FOR 39 LANGUAGES"☆117Updated 6 months ago
- ☆51Updated 8 months ago
- This project is a collection of fine-tuning scripts to help researchers fine-tune Qwen 2 VL on HuggingFace datasets.☆77Updated 5 months ago
- [NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs☆137Updated 8 months ago
- [IEEE VIS 2024] LLaVA-Chart: Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruc…☆73Updated 11 months ago
- [CVPR2025] VDocRAG: Retirval-Augmented Generation over Visually-Rich Documents☆54Updated 7 months ago
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆104Updated 7 months ago
- ☆71Updated last year
- ☆105Updated 7 months ago
- (ACL-2025 main conference) Dolphin: Moving Towards Closed-loop Auto-research through Thinking, Practice, and Feedback☆38Updated 6 months ago
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆92Updated last year
- Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step☆158Updated 5 months ago
- ☆226Updated 10 months ago
- Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context☆41Updated last year
- [ICLR 2025 Oral] "Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free"☆86Updated last year
- Code release for "SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers" [NeurIPS D&B, 2024]☆71Updated last year