vicgalle / zero-shot-reward-modelsLinks

ZYN: Zero-Shot Reward Models with Yes-No Questions

☆34

Alternatives and similar repositories for zero-shot-reward-models

Users that are interested in zero-shot-reward-models are comparing it to the libraries listed below

Sorting:

csitfun / LogiCoT
the instructions and demonstrations for building a formal logical reasoning capable GLM
☆53Updated 10 months ago
feyzaakyurek / rl4f
Code for RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs. ACL 2023.
☆64Updated 7 months ago
ShiZhengyan / PowerfulPromptFT
[NeurIPS 2023 Main Track] This is the repository for the paper titled "Don’t Stop Pretraining? Make Prompt-based Fine-tuning Powerful Lea…
☆74Updated last year
princeton-nlp / InstructEval
[NAACL 2024 Findings] Evaluation suite for the systematic evaluation of instruction selection methods.
☆22Updated last year
GasolSun36 / Iter-CoT
[NAACL 2024] Enhancing Chain-of-Thoughts Prompting with Iterative Bootstrapping in Large Language Models
☆85Updated last year
TianHongZXY / CoRe
[ACL 2023] Solving Math Word Problems via Cooperative Reasoning induced Language Models (LLMs + MCTS + Self-Improvement)
☆49Updated last year
allenai / sso
Repository for Skill Set Optimization
☆14Updated 11 months ago
ernie-research / Tool-Augmented-Reward-Model
[ICLR'24 spotlight] Tool-Augmented Reward Modeling
☆50Updated last month
martin-wey / CodeUltraFeedback
CodeUltraFeedback: aligning large language models to coding preferences
☆71Updated last year
Re-Align / just-eval
A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
☆85Updated last year
r-three / RAD
Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
☆43Updated last year
petezh / OpenD5
Tasks for describing differences between text distributions.
☆16Updated 11 months ago
facebookresearch / RLCD
Reproduction of "RLCD Reinforcement Learning from Contrast Distillation for Language Model Alignment
☆69Updated last year
debjitpaul / refiner
About The corresponding code from our paper " REFINER: Reasoning Feedback on Intermediate Representations" (EACL 2024). Do not hesitate t…
☆70Updated last year
wwxu21 / CUT
Source code of "Reasons to Reject? Aligning Language Models with Judgments"
☆58Updated last year
RUCAIBox / BAMBOO
☆35Updated last year
kyegomez / Reka-Torch
Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch
☆30Updated 2 weeks ago
Pranjal2041 / AdaptiveConsistency
Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning with LLMs
☆37Updated last year
maszhongming / ParaKnowTransfer
Code for "Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective"
☆32Updated last year
allenai / easy-to-hard-generalization
Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"
☆48Updated last year
NanshineLoong / Self-Evolving-Benchmark
A framework for evolving and testing question-answering datasets with various models.
☆16Updated last year
OSU-NLP-Group / llm-planning-eval
[ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"
☆54Updated last year
microsoft / RLHF-APA
RL algorithm: Advantage induced policy alignment
☆65Updated last year
WHGTyen / BIG-Bench-Mistake
A dataset of LLM-generated chain-of-thought steps annotated with mistake location.
☆81Updated 11 months ago
xhan77 / in-context-alignment
In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning
☆35Updated last year
WadeYin9712 / Dynosaur
Code and data for "Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation" (EMNLP 2023)
☆64Updated last year
GAIR-NLP / MetaCritique
Evaluate the Quality of Critique
☆36Updated last year
itl-ed / llm-dp
LLM Dynamic Planner - Combining LLM with PDDL Planners to solve an embodied task
☆44Updated 6 months ago
zjunlp / TRICE
[NAACL 2024] Making Language Models Better Tool Learners with Execution Feedback
☆41Updated last year
psunlpgroup / ReaLMistake
This repository includes a benchmark and code for the paper "Evaluating LLMs at Detecting Errors in LLM Responses".
☆30Updated 10 months ago