vwxyzjn / summarize_from_feedback_detailsView external linksLinks
☆160Nov 23, 2024Updated last year
Alternatives and similar repositories for summarize_from_feedback_details
Users that are interested in summarize_from_feedback_details are comparing it to the libraries listed below
Sorting:
- RLHF implementation details of OAI's 2019 codebase☆197Jan 14, 2024Updated 2 years ago
- RewardBench: the first evaluation tool for reward models.☆687Jan 31, 2026Updated 2 weeks ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Feb 22, 2024Updated last year
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models☆28Mar 22, 2024Updated last year
- Dateset Reset Policy Optimization☆31Apr 12, 2024Updated last year
- CodeUltraFeedback: aligning large language models to coding preferences (TOSEM 2025)☆73Jun 25, 2024Updated last year
- ☆13Jun 4, 2024Updated last year
- Code and data for the paper "Understanding Hidden Context in Preference Learning: Consequences for RLHF"☆33Dec 14, 2023Updated 2 years ago
- Directional Preference Alignment☆58Sep 23, 2024Updated last year
- ☆322Jul 25, 2024Updated last year
- ☆282Jan 6, 2025Updated last year
- Recipes to train reward model for RLHF.☆1,512Apr 24, 2025Updated 9 months ago
- [ACL2024 Findings]DMoERM: Recipes of Mixture-of-Experts for Effective Reward Modeling☆18Jun 6, 2024Updated last year
- ☆16Jul 23, 2024Updated last year
- Code to accompany the paper "The Information Geometry of Unsupervised Reinforcement Learning"☆20Oct 6, 2021Updated 4 years ago
- Code and example data for the paper: Rule Based Rewards for Language Model Safety☆206Jul 19, 2024Updated last year
- Gym wrapper for pysc2☆10Sep 16, 2022Updated 3 years ago
- ☆11Mar 13, 2023Updated 2 years ago
- ☆20Nov 4, 2025Updated 3 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆43Jun 28, 2024Updated last year
- Counterfactual Evaluation and Learning for Interactive Systems: Foundations, Implementations, and Recent Advances☆12Aug 14, 2022Updated 3 years ago
- This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity☆47Jan 19, 2024Updated 2 years ago
- Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]☆588Dec 9, 2024Updated last year
- ☆313Jun 9, 2024Updated last year
- A recipe for online RLHF and online iterative DPO.☆539Dec 28, 2024Updated last year
- [ACL 2023 Findings] What In-Context Learning “Learns” In-Context: Disentangling Task Recognition and Task Learning☆21Jul 9, 2023Updated 2 years ago
- MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models☆454Feb 1, 2024Updated 2 years ago
- The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"☆39Jan 12, 2024Updated 2 years ago
- Self-Alignment with Principle-Following Reward Models☆169Sep 18, 2025Updated 4 months ago
- ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment☆57Jun 16, 2024Updated last year
- ☆331May 31, 2025Updated 8 months ago
- A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).☆905Sep 30, 2025Updated 4 months ago
- Official repository for ORPO☆471May 31, 2024Updated last year
- Co-Supervised Learning: Improving Weak-to-Strong Generalization with Hierarchical Mixture of Experts☆16Feb 26, 2024Updated last year
- Extending context length of visual language models☆12Dec 18, 2024Updated last year
- Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models☆68Apr 26, 2025Updated 9 months ago
- Scalable toolkit for efficient model alignment☆852Oct 6, 2025Updated 4 months ago
- ☆31Oct 2, 2024Updated last year
- GenRM-CoT: Data release for verification rationales☆68Oct 16, 2024Updated last year