shenao-zhang / SELMLinks
The official implementation of Self-Exploring Language Models (SELM)
☆64Updated last year
Alternatives and similar repositories for SELM
Users that are interested in SELM are comparing it to the libraries listed below
Sorting:
- ☆120Updated 6 months ago
- ☆100Updated last year
- ☆101Updated 11 months ago
- Reinforcing General Reasoning without Verifiers☆79Updated 2 months ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆48Updated last year
- official implementation of paper "Process Reward Model with Q-value Rankings"☆60Updated 6 months ago
- ☆85Updated last year
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models☆60Updated 6 months ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆33Updated 3 weeks ago
- Directional Preference Alignment☆59Updated 11 months ago
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆104Updated last month
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆118Updated last year
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆82Updated 3 months ago
- ☆115Updated 7 months ago
- ☆47Updated 6 months ago
- Sotopia-π: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)☆75Updated last year
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆44Updated 4 months ago
- Sotopia-RL: Reward Design for Social Intelligence☆34Updated this week
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆141Updated 9 months ago
- RL Scaling and Test-Time Scaling (ICML'25)☆111Updated 7 months ago
- ☆34Updated 7 months ago
- [ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆101Updated 2 months ago
- Natural Language Reinforcement Learning☆95Updated 3 weeks ago
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate" [COLM 2025]☆170Updated last month
- ☆61Updated 5 months ago
- Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch☆179Updated 2 months ago
- Exploration of automated dataset selection approaches at large scales.☆47Updated 5 months ago
- This is the official repository for Inheritune.☆112Updated 6 months ago
- SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning☆136Updated 2 weeks ago
- ☆68Updated last year