cambridgeltl / PairSLinks

Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; COLM 2024)

☆48

Alternatives and similar repositories for PairS

Users that are interested in PairS are comparing it to the libraries listed below

Sorting:

casmlab / NPHardEval
Repository for NPHardEval, a quantified-dynamic benchmark of LLMs
☆61Updated last year
john-hewitt / implicit-ins
Codebase for Instruction Following without Instruction Tuning
☆36Updated last year
QingruZhang / PASTA
PASTA: Post-hoc Attention Steering for LLMs
☆129Updated last year
GAIR-NLP / scaleeval
Scalable Meta-Evaluation of LLMs as Evaluators
☆43Updated last year
awslabs / rag-qa-arena
☆50Updated last year
WHGTyen / BIG-Bench-Mistake
A dataset of LLM-generated chain-of-thought steps annotated with mistake location.
☆84Updated last year
cxcscmu / MATES
Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]
☆76Updated last year
googleinterns / localizing-paragraph-memorization
☆15Updated last year
orionw / FollowIR
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
☆49Updated last year
wwxu21 / CUT
Source code of "Reasons to Reject? Aligning Language Models with Judgments"
☆58Updated last year
abhika-m / FAVA
☆75Updated last year
McGill-NLP / retriever-lm-reasoning
Code for "Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language Model", EMNLP Findings 20…
☆28Updated 2 years ago
TIGER-AI-Lab / MAmmoTH2
Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]
☆149Updated last year
Re-Align / just-eval
A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
☆89Updated last year
ytyz1307zzh / RefAug
Code for EMNLP 2024 paper "Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning"
☆56Updated last year
OSU-NLP-Group / llm-planning-eval
[ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"
☆54Updated last year
martin-wey / CodeUltraFeedback
CodeUltraFeedback: aligning large language models to coding preferences (TOSEM 2025)
☆72Updated last year
allenai / easy-to-hard-generalization
Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"
☆48Updated last year
neelsjain / BYOD
The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"
☆107Updated 2 years ago
princeton-nlp / LLMBar
[ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following
☆134Updated last year
hamishivi / automated-instruction-selection
Exploration of automated dataset selection approaches at large scales.
☆50Updated 9 months ago
XiangLi1999 / AutoBencher
☆32Updated last year
Pranjal2041 / AdaptiveConsistency
Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning with LLMs
☆39Updated last year
allenai / noncompliance
This repository contains data, code and models for contextual noncompliance.
☆24Updated last year
facebookresearch / RLCD
Reproduction of "RLCD Reinforcement Learning from Contrast Distillation for Language Model Alignment
☆69Updated 2 years ago
locuslab / scaling_laws_data_filtering
☆65Updated last year
ernie-research / Tool-Augmented-Reward-Model
[ICLR'24 spotlight] Tool-Augmented Reward Modeling
☆51Updated 5 months ago
wzhouad / context-faithful-llm
Code and data for paper "Context-faithful Prompting for Large Language Models".
☆41Updated 2 years ago
yidingjiang / ado
The repository contains code for Adaptive Data Optimization
☆28Updated 11 months ago
csitfun / LogiCoT
the instructions and demonstrations for building a formal logical reasoning capable GLM
☆55Updated last year