agentica-project / verl-pipelineLinks
Async pipelined version of Verl
☆91Updated last month
Alternatives and similar repositories for verl-pipeline
Users that are interested in verl-pipeline are comparing it to the libraries listed below
Sorting:
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆186Updated 3 months ago
- Reproducing R1 for Code with Reliable Rewards☆201Updated last month
- A Comprehensive Survey on Long Context Language Modeling☆147Updated 2 weeks ago
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆155Updated 2 weeks ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆98Updated last month
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆136Updated 8 months ago
- Repo of paper "Free Process Rewards without Process Labels"☆149Updated 2 months ago
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models☆55Updated 3 months ago
- A scalable asynchronous reinforcement learning implementation with in-flight weight updates.☆119Updated this week
- Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents☆73Updated last month
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆94Updated 2 months ago
- A version of verl to support tool use☆172Updated this week
- Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718☆326Updated 8 months ago
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆102Updated 4 months ago
- ☆63Updated 6 months ago
- General Reasoner: Advancing LLM Reasoning Across All Domains☆126Updated last week
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆120Updated 8 months ago
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆239Updated last month
- ☆69Updated 6 months ago
- GenRM-CoT: Data release for verification rationales☆61Updated 7 months ago
- Reproducible, flexible LLM evaluations☆204Updated 3 weeks ago
- Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (…☆100Updated last week
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)☆106Updated 2 months ago
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆106Updated 5 months ago
- ☆201Updated 3 months ago
- ☆93Updated 8 months ago
- Revisiting Mid-training in the Era of RL Scaling☆48Updated last month
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆207Updated last month
- Simple and efficient pytorch-native transformer training and inference (batched)☆75Updated last year
- Code for Paper: Learning Adaptive Parallel Reasoning with Language Models☆94Updated last month