codezakh / DataEnvGymLinks
A testbed for agents and environments that can automatically improve models through data generation.
☆27Updated 10 months ago
Alternatives and similar repositories for DataEnvGym
Users that are interested in DataEnvGym are comparing it to the libraries listed below
Sorting:
- Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)☆39Updated last year
- ☆33Updated last year
- Official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)☆63Updated this week
- Official Repo for InSTA: Towards Internet-Scale Training For Agents☆55Updated 6 months ago
- ☆50Updated 10 months ago
- Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers☆25Updated 10 months ago
- implementation of dualformer☆24Updated 10 months ago
- ☆65Updated 10 months ago
- Natural Language Reinforcement Learning☆101Updated 5 months ago
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆113Updated 5 months ago
- ☆46Updated 6 months ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆147Updated last year
- The official implementation of Self-Exploring Language Models (SELM)☆63Updated last year
- ☆79Updated 2 months ago
- ☆88Updated 2 months ago
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆41Updated last year
- Repository for the paper Stream of Search: Learning to Search in Language☆152Updated 11 months ago
- ☆117Updated 11 months ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆38Updated last year
- Reinforcing General Reasoning without Verifiers☆93Updated 6 months ago
- ☆28Updated 2 months ago
- ☆29Updated 10 months ago
- Tree prompting: easy-to-use scikit-learn interface for improved prompting.☆41Updated 2 years ago
- Defeating the Training-Inference Mismatch via FP16☆172Updated last month
- ☆52Updated 9 months ago
- CodeUltraFeedback: aligning large language models to coding preferences (TOSEM 2025)☆73Updated last year
- ☆112Updated last year
- ☆51Updated 8 months ago
- Lottery Ticket Adaptation☆40Updated last year
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆87Updated last year