[EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
☆29May 22, 2025Updated last year
Alternatives and similar repositories for ProSA
Users that are interested in ProSA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆44Jun 28, 2024Updated 2 years ago
- ☆15Mar 18, 2025Updated last year
- The official implementation of "Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks"☆56May 22, 2025Updated last year
- [ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset☆115May 22, 2025Updated last year
- [ACL 2026] OPT-BENCH: Evaluating the Iterative Self-Optimization of LLM Agents in Large-Scale Search Spaces☆125May 12, 2026Updated last month
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- A tiny paper rating web☆40Mar 19, 2025Updated last year
- Assessing Context-Aware Creative Intelligence in MLLMs☆23Jul 22, 2025Updated 11 months ago
- ICLR 2026 - official implementation for "MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval"☆108Apr 21, 2026Updated 2 months ago
- [NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing☆151May 18, 2026Updated last month
- [EMNLP 2025] CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward☆69Aug 10, 2025Updated 10 months ago
- [ACL 2025] Are Your LLMs Capable of Stable Reasoning?☆33Aug 5, 2025Updated 10 months ago
- Official implementation for paper "How Far Are We from Genuinely Useful Deep Research Agents?"