Pi3AI / DreamGymLinks
This is AI implementation (not official) of the DreamGym framework from the paper "Scaling Agent Learning via Experience Synthesis" (arXiv:2511.03773).
☆35Updated 2 months ago
Alternatives and similar repositories for DreamGym
Users that are interested in DreamGym are comparing it to the libraries listed below
Sorting:
- ☆53Updated 11 months ago
- ☆87Updated 5 months ago
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆155Updated last year
- This repository is maintained to release dataset and models for multimodal puzzle reasoning.☆113Updated 11 months ago
- ☆219Updated 8 months ago
- A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward model…☆62Updated 7 months ago
- Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay☆148Updated 8 months ago
- ☆104Updated last year
- ☆54Updated 11 months ago
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆65Updated last year
- A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models☆27Updated last year
- [EMNLP 2025] WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning☆70Updated 3 months ago
- A curated list of awesome LLM Inference-Time Self-Improvement (ITSI, pronounced "itsy") papers from our recent survey: A Survey on Large …☆101Updated last year
- [TMLR 25] SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models☆149Updated 3 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆53Updated 8 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆109Updated 8 months ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆65Updated last year
- ☆53Updated last year
- RM-R1: Unleashing the Reasoning Potential of Reward Models☆158Updated 7 months ago
- Revisiting Mid-training in the Era of Reinforcement Learning Scaling☆182Updated 6 months ago
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆51Updated last year
- Scaling Preference Data Curation via Human-AI Synergy☆139Updated 7 months ago
- [ICLR 2026] PSFT is a trust-region–inspired fine-tuning objective that views SFT as a policy gradient method with constant advantages, co…☆34Updated 4 months ago
- ☆352Updated 6 months ago
- This the implementation of LeCo☆31Updated last year
- Offical Repository of "AtomThink: Multimodal Slow Thinking with Atomic Step Reasoning"☆62Updated 2 months ago
- ☆109Updated 6 months ago
- General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]☆216Updated 2 months ago
- m&ms: A Benchmark to Evaluate Tool-Use for multi-step multi-modal tasks☆44Updated last year
- Repo of paper "Free Process Rewards without Process Labels"☆168Updated 10 months ago