microsoft / debug-gymLinks
A Text-Based Environment for Interactive Debugging
☆286Updated last week
Alternatives and similar repositories for debug-gym
Users that are interested in debug-gym are comparing it to the libraries listed below
Sorting:
- Super basic implementation (gist-like) of RLMs with REPL environments.☆290Updated 2 months ago
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆401Updated this week
- ☆79Updated 3 months ago
- Catch MCP server issues before your agents do.☆138Updated 2 weeks ago
- An alignment auditing agent capable of quickly exploring alignment hypothesis☆759Updated 2 weeks ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆246Updated last week
- A better way of testing, inspecting, and analyzing AI Agent traces.☆40Updated 2 months ago
- A multi-agent LLM system for detecting and resolving cognitive dissonance.☆270Updated 2 months ago
- Verifiers for LLM Reinforcement Learning☆79Updated 3 months ago
- Harbor is a framework for running agent evaluations and creating and using RL environments.☆241Updated this week
- Benchmark and optimize LLM inference across frameworks with ease☆151Updated 3 months ago
- ☆59Updated 11 months ago
- Context Engineering Course with DSPy☆206Updated 5 months ago
- ☆235Updated last month
- ☆235Updated last week
- ☆188Updated 5 months ago
- A framework for optimizing DSPy programs with RL☆302Updated last month
- A Tree Search Library with Flexible API for LLM Inference-Time Scaling☆509Updated 3 weeks ago
- ☆85Updated 3 months ago
- [NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents☆492Updated last week
- Agent computer interface for AI software engineer.☆113Updated 3 weeks ago
- Public repository containing METR's DVC pipeline for eval data analysis☆168Updated 8 months ago
- DSPy module for OpenAI Codex SDK - signature-driven agentic workflows☆146Updated 3 weeks ago
- Test Generation for Prompts☆145Updated 3 weeks ago
- SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?☆236Updated last month
- Evolve your language agent with Agentic Context Engineering (ACE)☆439Updated last month
- Run Surfer-H agents powered by Holo1 using the Surfer-H-CLI. Includes example tasks, scripts, and configurations.☆145Updated 2 weeks ago
- MCP-Universe is a comprehensive framework designed for developing, testing, and benchmarking AI agents☆527Updated last week
- 🧬 The Huxley-Gödel Machine☆314Updated last month
- ☆686Updated last week