microsoft / debug-gymLinks
A Text-Based Environment for Interactive Debugging
☆272Updated this week
Alternatives and similar repositories for debug-gym
Users that are interested in debug-gym are comparing it to the libraries listed below
Sorting:
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆327Updated this week
- ☆78Updated last week
- ☆146Updated 2 months ago
- A better way of testing, inspecting, and analyzing AI Agent traces.☆40Updated last week
- ☆232Updated 3 months ago
- ☆57Updated 8 months ago
- An alignment auditing agent capable of quickly exploring alignment hypothesis☆364Updated this week
- Context Engineering Course with DSPy☆189Updated 2 months ago
- A Tree Search Library with Flexible API for LLM Inference-Time Scaling☆475Updated 2 months ago
- Public repository containing METR's DVC pipeline for eval data analysis☆117Updated 6 months ago
- MCP-Universe is a comprehensive framework designed for developing, testing, and benchmarking AI agents☆445Updated this week
- Verifiers for LLM Reinforcement Learning☆75Updated 3 weeks ago
- A framework for optimizing DSPy programs with RL☆191Updated this week
- ☆78Updated last month
- Coding problems used in aider's polyglot benchmark☆182Updated 9 months ago
- Code to accompany the Universal Deep Research paper (https://arxiv.org/abs/2509.00244)☆441Updated last month
- An Automatic Prompt Optimization Framework for Large Language Models☆122Updated 2 months ago
- frozen-in-time version of our Paper Finder agent for reproducing evaluation results☆187Updated last month
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆33Updated 5 months ago
- [NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents☆414Updated last week
- Benchmark and optimize LLM inference across frameworks with ease☆113Updated 3 weeks ago
- A multi-agent LLM system for detecting and resolving cognitive dissonance.☆267Updated last month
- ☆36Updated 8 months ago
- a Python library that uses Reinforcement Learning (RL) to train LLMs.☆41Updated 2 months ago
- This repository contains the toolkit for replicating results from our technical report.☆138Updated last month
- ☆68Updated 4 months ago
- Together Open Deep Research☆352Updated 5 months ago
- Catch MCP server issues before your agents do.☆106Updated last week
- SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?☆181Updated this week
- ☆104Updated 3 months ago