agentica-project / rllmLinks
☆59Updated 3 weeks ago
Alternatives and similar repositories for rllm
Users that are interested in rllm are comparing it to the libraries listed below
Sorting:
- Super-Efficient RLHF Training of LLMs with Parameter Reallocation☆309Updated 4 months ago
- ☆274Updated 3 months ago
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆245Updated 4 months ago
- Async pipelined version of Verl☆117Updated 4 months ago
- End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning☆181Updated 3 weeks ago
- ☆313Updated 3 months ago
- Repo of paper "Free Process Rewards without Process Labels"☆162Updated 5 months ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆101Updated 3 weeks ago
- A Framework for LLM-based Multi-Agent Reinforced Training and Inference☆218Updated last week
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆239Updated 3 months ago
- A version of verl to support tool use☆341Updated this week
- [COLM 2025] Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents☆153Updated last month
- Reproducing R1 for Code with Reliable Rewards☆251Updated 3 months ago
- ☆207Updated 6 months ago
- ☆204Updated 5 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆248Updated 3 months ago
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆321Updated last year
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆285Updated last month
- A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning☆244Updated 2 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆221Updated 5 months ago
- ☆191Updated 2 weeks ago
- SkyRL: A Modular Full-stack RL Library for LLMs☆765Updated this week
- ☆327Updated last month
- ☆197Updated last week
- Implementation for FP8/INT8 Rollout for RL training without performence drop.☆155Updated this week
- ☆115Updated 7 months ago
- Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)☆191Updated last year
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆112Updated 8 months ago
- A Comprehensive Survey on Long Context Language Modeling☆180Updated last month
- Deepseek R1 zero tiny version own reproduce on two A100s.☆71Updated 6 months ago