openpsi-project / ReaLHF
Super-Efficient RLHF Training of LLMs with Parameter Reallocation
☆255Updated 2 months ago
Alternatives and similar repositories for ReaLHF:
Users that are interested in ReaLHF are comparing it to the libraries listed below
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)☆603Updated 2 months ago
- A flexible and efficient training framework for large-scale alignment tasks☆333Updated last month
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆161Updated last week
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆216Updated last week
- ☆143Updated 2 weeks ago
- Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718☆313Updated 6 months ago
- ☆324Updated last month
- USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference☆463Updated 2 weeks ago
- The related works and background techniques about Openai o1☆217Updated 2 months ago
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆301Updated 7 months ago
- A series of technical report on Slow Thinking with LLM☆615Updated this week
- REST: Retrieval-Based Speculative Decoding, NAACL 2024☆198Updated 4 months ago
- ☆262Updated 2 weeks ago
- Paper list for Efficient Reasoning.☆331Updated this week
- Repo of paper "Free Process Rewards without Process Labels"☆138Updated 2 weeks ago
- Reproducing R1 for Code with Reliable Rewards☆140Updated 3 weeks ago
- Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)☆245Updated last week
- Ring attention implementation with flash attention☆721Updated last month
- ☆574Updated 2 weeks ago
- ☆216Updated this week
- [NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.☆433Updated 8 months ago
- Repository of LV-Eval Benchmark☆61Updated 7 months ago
- Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"☆357Updated 2 months ago
- ☆171Updated last month
- Distributed RL System for LLM Reasoning☆201Updated 3 weeks ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆162Updated 2 weeks ago
- ☆318Updated 8 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆171Updated 3 weeks ago
- A Survey on Efficient Reasoning for LLMs☆204Updated this week
- [ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"☆395Updated 5 months ago