open-compass / ProSA
[EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
☆24Updated 3 months ago
Alternatives and similar repositories for ProSA:
Users that are interested in ProSA are comparing it to the libraries listed below
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆49Updated 4 months ago
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆41Updated 4 months ago
- ☆61Updated this week
- [ICLR 2025] SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights☆53Updated last week
- ☆38Updated last week
- [NAACL 2025] Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆23Updated 4 months ago
- Official Repository of Are Your LLMs Capable of Stable Reasoning?☆18Updated this week
- Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting☆27Updated 11 months ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆63Updated 8 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆41Updated last week
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆107Updated 9 months ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Updated 11 months ago
- The official implementation of "Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks"☆53Updated 9 months ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆48Updated 2 weeks ago
- MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension☆39Updated 2 months ago
- [ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆34Updated 2 months ago
- Code release for "SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers" [NeurIPS D&B, 2024]☆50Updated last month
- ☆64Updated 2 weeks ago
- SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433☆21Updated 2 months ago
- ☆39Updated 6 months ago
- Long Context Extension and Generalization in LLMs☆48Updated 5 months ago
- What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective☆59Updated 3 months ago
- HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models☆38Updated 2 months ago
- An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)☆25Updated last week
- [ICLR 2025] InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales☆72Updated 2 weeks ago
- Official PyTorch Implementation for Task Vectors are Cross-Modal☆21Updated 2 months ago
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆40Updated 3 months ago
- Code & Dataset for Paper: "Distill Visual Chart Reasoning Ability from LLMs to MLLMs"☆50Updated 3 months ago
- Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations" (ICLR '25)☆59Updated 3 weeks ago