open-compass / ProSA
[EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
☆25Updated 6 months ago
Alternatives and similar repositories for ProSA:
Users that are interested in ProSA are comparing it to the libraries listed below
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆56Updated 6 months ago
- Official Repository of Are Your LLMs Capable of Stable Reasoning?☆25Updated last month
- Large Language Models Can Self-Improve in Long-context Reasoning☆69Updated 5 months ago
- ☆38Updated 2 months ago
- Knowledge Unlearning for Large Language Models☆25Updated this week
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆93Updated this week
- ☆40Updated this week
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆43Updated 2 months ago
- [NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models☆42Updated last week
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆69Updated last month
- [NAACL 2025] Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆24Updated 7 months ago
- [NeurIPS 2024] A comprehensive benchmark for evaluating critique ability of LLMs☆39Updated 5 months ago
- SIFT: Grounding LLM Reasoning in Contexts via Stickers☆56Updated 2 months ago
- ☆14Updated 4 months ago
- HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models☆42Updated 5 months ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆57Updated 3 months ago
- [ICML 2024 Oral] Official code repository for MLLM-as-a-Judge.☆67Updated 2 months ago
- Codes and datasets for the paper Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Ref…☆54Updated 2 months ago
- What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective☆63Updated 2 months ago
- Code for "A Sober Look at Progress in Language Model Reasoning" paper☆45Updated this week
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆42Updated 6 months ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆65Updated 11 months ago
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"☆84Updated this week
- MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension☆44Updated 5 months ago
- Code & Dataset for Paper: "Distill Visual Chart Reasoning Ability from LLMs to MLLMs"☆53Updated 6 months ago
- Code and data for the ACL 2024 Findings paper "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning"☆26Updated 11 months ago
- An Easy-to-use Hallucination Detection Framework for LLMs.☆58Updated last year
- [Preprint] A Generalizable and Purely Unsupervised Self-Training Framework☆56Updated 3 weeks ago
- ☆17Updated 4 months ago
- Code release for "SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers" [NeurIPS D&B, 2024]☆57Updated 3 months ago