allenai / FineGrainedRLHFLinks
☆276Updated 5 months ago
Alternatives and similar repositories for FineGrainedRLHF
Users that are interested in FineGrainedRLHF are comparing it to the libraries listed below
Sorting:
- Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models☆264Updated 9 months ago
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆317Updated 10 months ago
- RewardBench: the first evaluation tool for reward models.☆604Updated last week
- Code for STaR: Bootstrapping Reasoning With Reasoning (NeurIPS 2022)☆206Updated 2 years ago
- Self-Alignment with Principle-Following Reward Models☆161Updated last month
- DSIR large-scale data selection framework for language model training☆251Updated last year
- 🐋 An unofficial implementation of Self-Alignment with Instruction Backtranslation.☆140Updated last month
- A large-scale, fine-grained, diverse preference dataset (and models).☆341Updated last year
- LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA☆217Updated 2 years ago
- All available datasets for Instruction Tuning of Large Language Models☆252Updated last year
- Generative Judge for Evaluating Alignment☆239Updated last year
- [ICML 2024] Selecting High-Quality Data for Training Language Models☆176Updated last year
- (ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training☆276Updated last year
- Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…☆125Updated last year
- [ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning☆456Updated 8 months ago
- Code and data for "Lost in the Middle: How Language Models Use Long Contexts"☆347Updated last year
- Repository for Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning☆164Updated last year
- Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]☆556Updated 6 months ago
- Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets☆332Updated last year
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆158Updated last month
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆141Updated 4 months ago
- ☆67Updated last year
- [NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct☆177Updated 5 months ago
- [EMNLP 2023] Adapting Language Models to Compress Long Contexts☆306Updated 9 months ago
- ☆48Updated 3 months ago
- Implementation of ICML 23 Paper: Specializing Smaller Language Models towards Multi-Step Reasoning.☆131Updated 2 years ago
- RLHF implementation details of OAI's 2019 codebase☆187Updated last year
- ☆159Updated 2 years ago
- ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings - NeurIPS 2023 (oral)☆262Updated last year
- Data and code for the ICLR 2023 paper "Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning".☆155Updated last year