open-compass/ProSA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/open-compass/ProSA)

open-compass / ProSA

[EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs

☆29

Alternatives and similar repositories for ProSA

Users that are interested in ProSA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

SparksJoe / Prism
View on GitHub
A Framework for Decoupling and Assessing the Capabilities of VLMs
☆44Jun 28, 2024Updated 2 years ago
open-compass / Ada-LEval
View on GitHub
The official implementation of "Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks"
☆56May 22, 2025Updated last year
xinyan-cxy / EmpathyAgent
View on GitHub
☆15Mar 18, 2025Updated last year
open-compass / CompassVerifier
View on GitHub
[EMNLP 2025] CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward
☆68Aug 10, 2025Updated 11 months ago
open-compass / MathBench
View on GitHub
[ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset
☆115May 22, 2025Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
open-compass / Creation-MMBench
View on GitHub
Assessing Context-Aware Creative Intelligence in MLLMs
☆23Jul 22, 2025Updated 11 months ago
WayneJin0918 / SOTA-paper-rating.io
View on GitHub
A tiny paper rating web
☆41Mar 19, 2025Updated last year
open-compass / CompassJudger
View on GitHub
The All-in-one Judge Models introduced by Opencompass
☆119Jul 15, 2025Updated last year
open-compass / GPassK
View on GitHub
[ACL 2025] Are Your LLMs Capable of Stable Reasoning?
☆33Aug 5, 2025Updated 11 months ago
jtonglet / Numerical-Hybrid-QA-Literature
View on GitHub
A list of Numerical Multimodal reasoning papers and their implementation
☆11May 13, 2024Updated 2 years ago
Timothyxxx / KVCachePapers
View on GitHub
☆20May 24, 2024Updated 2 years ago
gentaiscool / miners
View on GitHub
MINERS ⛏️: The semantic retrieval benchmark for evaluating multilingual language models. (EMNLP 2024 Findings)
☆14Oct 3, 2024Updated last year
MME-Benchmarks / MME-CoT
View on GitHub
MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency
☆136Aug 5, 2025Updated 11 months ago
HashmatShadab / HSAT
View on GitHub
[MICCAI 2025] Hierarchical Self-Supervised Adversarial Training for Robust Vision Models in Histopathology
☆12Jun 17, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
hkust-nlp / model-task-align-rl
View on GitHub
[ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".
☆18Feb 9, 2026Updated 5 months ago
matchten / LoRA-Models-for-SAEs
View on GitHub
Code for reproducing our paper "Low Rank Adapting Models for Sparse Autoencoder Features"
☆17Mar 31, 2025Updated last year
OpenEvaluation / VLMEvalKit
View on GitHub
☆23Apr 11, 2026Updated 3 months ago
allenai / noncompliance
View on GitHub
This repository contains data, code and models for contextual noncompliance.
☆26Jul 18, 2024Updated 2 years ago
ronghanghu / moco_v3_tpu
View on GitHub
☆16Apr 10, 2022Updated 4 years ago
PlusLabNLP / VISCO
View on GitHub
[CVPR 2025] VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning
☆13Jun 7, 2025Updated last year
kkkevinkkkkk / situated_faithfulness
View on GitHub
☆14Oct 17, 2024Updated last year
RyanHangZhou / tensorflow-DUP-Net
View on GitHub
Tensorflow Implementation of "DUP-Net: Denoiser and Upsampler Network for 3D Adversarial Point Clouds Defense", ICCV 2019
☆15Aug 10, 2021Updated 4 years ago
OPPO-PersonalAI / FINDER_DEFT
View on GitHub
Official implementation for paper "How Far Are We from Genuinely Useful Deep Research Agents?"
☆65Dec 10, 2025Updated 7 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
open-compass / RePro
View on GitHub
[ICLR 2026] Rectifying LLM Thought From Lens of Optimization
☆15Dec 5, 2025Updated 7 months ago
xinyan-cxy / MINT-CoT
View on GitHub
[NeurIPS 2025] MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning
☆107Sep 19, 2025Updated 10 months ago
StigLidu / TURN
View on GitHub
[ICML2025] Official Repo for Paper "Optimizing Temperature for Language Models with Multi-Sample Inference"
☆23Feb 16, 2025Updated last year
taoyds / grappa
View on GitHub
☆31Sep 4, 2021Updated 4 years ago
VITA-Group / LoCoCo
View on GitHub
[ICML‘2024] "LoCoCo: Dropping In Convolutions for Long Context Compression", Ruisi Cai, Yuandong Tian, Zhangyang Wang, Beidi Chen
☆17Sep 7, 2024Updated last year
StigLidu / CodeGym
View on GitHub
[ICLR2026] The official repository for the CodeGym project: "Generalizable End-to-End Tool-Use RL with Synthetic CodeGym"
☆32Oct 14, 2025Updated 9 months ago
SPIRAL-MED / Ophiuchus
View on GitHub
☆41Jan 14, 2025Updated last year
microsoft / x-reasoner
View on GitHub
X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains
☆49Feb 4, 2026Updated 5 months ago
whitepurple / HBM-loss-for-HTC
View on GitHub
[ACL 2024 Findings] Hierarchy-aware Biased Bound Margin Loss Function for Hierarchical Text Classification
☆15Oct 26, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ZhentingWang / RONAN
View on GitHub
☆16May 23, 2024Updated 2 years ago
vickywu1022 / OntoProbe-PLMs
View on GitHub
Repo for outstanding paper@ACL 2023 "Do PLMs Know and Understand Ontological Knowledge?"
☆33Oct 16, 2023Updated 2 years ago
beichenzbc / BoostStep
View on GitHub
official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"
☆37Jan 21, 2025Updated last year
neelnanda-io / Neuroscope
View on GitHub
Accompanying codebase for neuroscope.io, a website for displaying max activating dataset examples for language model neurons
☆14Feb 13, 2023Updated 3 years ago
DigitalHarborFoundation / FlexEval
View on GitHub
FlexEval is an LLM evaluation tool designed for practical quantitative analysis.
☆16Updated this week
ytyz1307zzh / RefAug
View on GitHub
Code for EMNLP 2024 paper "Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning"
☆55Oct 1, 2024Updated last year
rohit901 / VANE-Bench
View on GitHub
[NAACL'25] Contains code and documentation for our VANE-Bench paper.
☆24Aug 19, 2025Updated 11 months ago