gagan3012 / self_rewarding_modelsLinks

Paper Implementation of Self-Rewarding Language Models

☆13

Alternatives and similar repositories for self_rewarding_models

Users that are interested in self_rewarding_models are comparing it to the libraries listed below

Sorting:

allenai / FineGrainedRLHF
☆280Updated 8 months ago
MARIO-Math-Reasoning / MARIO_EVAL
☆51Updated 6 months ago
glgh / awesome-llm-human-preference-datasets
A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.
☆378Updated last year
tlc4418 / llm_optimization
A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.
☆44Updated 8 months ago
YuxiXie / MCTS-DPO
This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.
☆324Updated last year
vwxyzjn / lm-human-preference-details
RLHF implementation details of OAI's 2019 codebase
☆190Updated last year
OpenBMB / UltraFeedback
A large-scale, fine-grained, diverse preference dataset (and models).
☆352Updated last year
ellaneeman / disent_qa
This code accompanies the paper DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering.
☆16Updated 2 years ago
xingyaoww / mint-bench
Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…
☆130Updated last year
0xallam / Direct-Preference-Optimization
Direct Preference Optimization from scratch in PyTorch
☆111Updated 5 months ago
zankner / CLoud
Critique-out-Loud Reward Models
☆70Updated 11 months ago
FreedomIntelligence / OVM
☆68Updated last year
jasonvanf / llama-trl
LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA
☆228Updated last month
waterhorse1 / LLM_Tree_Search
(ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training
☆280Updated last year
LLaMafia / SFT_function_learning
Explore what LLMs are really leanring over SFT
☆29Updated last year
TianduoWang / MsAT
[ACL 2023] Learning Multi-step Reasoning by Solving Arithmetic Tasks. https://arxiv.org/abs/2306.01707
☆24Updated 2 years ago
YJiangcm / FollowBench
[ACL 2024] FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models
☆113Updated 3 months ago
YifeiZhou02 / ArCHer
Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"
☆190Updated 5 months ago
Ber666 / RAP
Reasoning with Language Model is Planning with World Model
☆173Updated 2 years ago
Linear95 / APO
Code for ACL2024 paper - Adversarial Preference Optimization (APO).
☆56Updated last year
mengdi-li / awesome-RLAIF
A continually updated list of literature on Reinforcement Learning from AI Feedback (RLAIF)
☆186Updated last month
Cohere-Labs-Community / m-rewardbench
Official Code for M-RᴇᴡᴀʀᴅBᴇɴᴄʜ: Evaluating Reward Models in Multilingual Settings (ACL 2025 Main)
☆35Updated 4 months ago
Columbia-NLP-Lab / LionAlignment
☆11Updated last year
SimengSun / alpaca_farm_lora
☆22Updated 2 years ago
GAIR-NLP / alignment-for-honesty
☆75Updated last year
Zhou-Zoey / RMB-Reward-Model-Benchmark
☆43Updated 6 months ago
Vance0124 / Token-level-Direct-Preference-Optimization
Reference implementation for Token-level Direct Preference Optimization(TDPO)
☆148Updated 7 months ago
karthikv792 / LLMs-Planning
An extensible benchmark for evaluating large language models on planning
☆409Updated last week
allenai / reward-bench
RewardBench: the first evaluation tool for reward models.
☆638Updated 3 months ago
arkilpatel / SVAMP
NAACL 2021: Are NLP Models really able to Solve Simple Math Word Problems?
☆131Updated 3 years ago