RLHFlow / RLHFlow.github.io
Webpage for RLHFlow
☆9Updated 3 weeks ago
Alternatives and similar repositories for RLHFlow.github.io:
Users that are interested in RLHFlow.github.io are comparing it to the libraries listed below
- Directional Preference Alignment☆56Updated 5 months ago
- This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…☆23Updated 5 months ago
- ☆55Updated 3 months ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆49Updated 4 months ago
- GenRM-CoT: Data release for verification rationales☆47Updated 4 months ago
- The code for creating the iGSM datasets in papers "Physics of Language Models Part 2.1, Grade-School Math and the Hidden Reasoning Proces…☆36Updated last month
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆42Updated 6 months ago
- Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts☆21Updated 11 months ago
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆66Updated 6 months ago
- [ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…☆31Updated 7 months ago
- ThinK: Thinner Key Cache by Query-Driven Pruning☆15Updated last week
- ☆80Updated 11 months ago
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆127Updated last week
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆121Updated 3 months ago
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy☆48Updated 2 months ago
- ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment☆50Updated 8 months ago
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆102Updated 11 months ago
- This is an official implementation of the paper ``Building Math Agents with Multi-Turn Iterative Preference Learning'' with multi-turn DP…☆20Updated 2 months ago
- Lightweight Adapting for Black-Box Large Language Models☆19Updated last year
- Code and models for EMNLP 2024 paper "WPO: Enhancing RLHF with Weighted Preference Optimization"☆37Updated 4 months ago
- ☆13Updated 7 months ago
- Repo of paper "Free Process Rewards without Process Labels"☆123Updated last month
- ☆30Updated 5 months ago
- [ACL 2024] Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models☆16Updated 7 months ago
- Domain-specific preference (DSP) data and customized RM fine-tuning.☆24Updated 11 months ago
- ☆41Updated 3 months ago
- Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"☆29Updated 8 months ago