open-compass / ProSA
[EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
☆17Updated 3 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for ProSA
- ☆29Updated 3 weeks ago
- Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆22Updated last month
- MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆17Updated last week
- Code release for "SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers"☆40Updated last month
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆46Updated 3 weeks ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆57Updated 5 months ago
- ☆36Updated last month
- [ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning☆30Updated 3 months ago
- Code and data for the ACL 2024 Findings paper "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning"☆23Updated 5 months ago
- The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…☆52Updated last month
- Code for paper "Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning"☆63Updated 9 months ago
- ☆15Updated 3 months ago
- [NAACL 2024] A Synthetic, Scalable and Systematic Evaluation Suite for Large Language Models☆32Updated 5 months ago
- ✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio☆29Updated 3 weeks ago
- Benchmarking Benchmark Leakage in Large Language Models☆44Updated 5 months ago
- [ATTRIB @ NeurIPS 2024] When Attention Sink Emerges in Language Models: An Empirical View☆27Updated 3 weeks ago
- ☆29Updated 2 months ago
- ☆45Updated last year
- ☆24Updated 9 months ago
- MATH-Vision dataset and code to measure Multimodal Mathematical Reasoning capabilities.☆68Updated last month
- Official repository for paper "GTA: A Benchmark for General Tool Agents" (NeurIPS 2024 D&B Track)☆43Updated this week
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆37Updated 6 months ago
- The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".☆32Updated last year
- ☆84Updated 10 months ago
- We introduce EMMET and unify model editing with popular algorithms ROME and MEMIT.☆12Updated 2 months ago
- Code for https://arxiv.org/abs/2401.17139 (NeurIPS 2024)☆23Updated this week
- Code and data for ACL 2024 paper on 'Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space'☆11Updated 3 months ago
- ☆19Updated last month
- The official implementation of "Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks"☆50Updated 6 months ago
- MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆51Updated last month