gagan3012 / self_rewarding_modelsLinks
Paper Implementation of Self-Rewarding Language Models
☆13Updated last year
Alternatives and similar repositories for self_rewarding_models
Users that are interested in self_rewarding_models are comparing it to the libraries listed below
Sorting:
- ☆280Updated 8 months ago
- ☆51Updated 6 months ago
- A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.☆378Updated last year
- A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.☆44Updated 8 months ago
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆324Updated last year
- RLHF implementation details of OAI's 2019 codebase☆190Updated last year
- A large-scale, fine-grained, diverse preference dataset (and models).☆352Updated last year
- This code accompanies the paper DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering.☆16Updated 2 years ago
- Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…☆130Updated last year
- Direct Preference Optimization from scratch in PyTorch☆111Updated 5 months ago
- Critique-out-Loud Reward Models☆70Updated 11 months ago
- ☆68Updated last year
- LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA☆228Updated last month
- (ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training☆280Updated last year
- Explore what LLMs are really leanring over SFT☆29Updated last year
- [ACL 2023] Learning Multi-step Reasoning by Solving Arithmetic Tasks. https://arxiv.org/abs/2306.01707☆24Updated 2 years ago
- [ACL 2024] FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models☆113Updated 3 months ago
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆190Updated 5 months ago
- Reasoning with Language Model is Planning with World Model☆173Updated 2 years ago
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆56Updated last year
- A continually updated list of literature on Reinforcement Learning from AI Feedback (RLAIF)☆186Updated last month
- Official Code for M-RᴇᴡᴀʀᴅBᴇɴᴄʜ: Evaluating Reward Models in Multilingual Settings (ACL 2025 Main)☆35Updated 4 months ago
- ☆11Updated last year
- ☆22Updated 2 years ago
- ☆75Updated last year
- ☆43Updated 6 months ago
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆148Updated 7 months ago
- An extensible benchmark for evaluating large language models on planning☆409Updated last week
- RewardBench: the first evaluation tool for reward models.☆638Updated 3 months ago
- NAACL 2021: Are NLP Models really able to Solve Simple Math Word Problems?☆131Updated 3 years ago