RAGEN-AI / VAGEN
☆39Updated this week
Alternatives and similar repositories for VAGEN:
Users that are interested in VAGEN are comparing it to the libraries listed below
- ☆103Updated 2 months ago
- Code for Paper: Teaching Language Models to Critique via Reinforcement Learning☆84Updated last month
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆75Updated 2 months ago
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆95Updated 2 months ago
- Code for "Reasoning to Learn from Latent Thoughts"☆51Updated last week
- Repo of paper "Free Process Rewards without Process Labels"☆138Updated 2 weeks ago
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆46Updated 4 months ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆51Updated last month
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆140Updated this week
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆130Updated 4 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆119Updated 6 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆104Updated last week
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆78Updated 2 weeks ago
- ☆59Updated 3 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆132Updated 5 months ago
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models☆44Updated last month
- ☆43Updated 5 months ago
- GenRM-CoT: Data release for verification rationales☆53Updated 5 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆162Updated 2 weeks ago
- This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"☆47Updated 5 months ago
- ☆129Updated this week
- ☆59Updated 2 weeks ago
- Natural Language Reinforcement Learning☆84Updated 3 months ago
- ☆53Updated 3 weeks ago
- M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning☆56Updated 3 months ago
- ☆59Updated 6 months ago
- ☆41Updated 3 weeks ago
- Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"☆29Updated 9 months ago
- ☆84Updated last month
- [NeurIPS 2024] Official Implementation for Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks☆70Updated 2 weeks ago