microsoft / debug-gymLinks
A Text-Based Environment for Interactive Debugging
☆289Updated this week
Alternatives and similar repositories for debug-gym
Users that are interested in debug-gym are comparing it to the libraries listed below
Sorting:
- ☆80Updated 3 months ago
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆409Updated last week
- Benchmark and optimize LLM inference across frameworks with ease☆155Updated 4 months ago
- ☆59Updated 11 months ago
- Agent computer interface for AI software engineer.☆115Updated last month
- ☆236Updated last month
- A Tree Search Library with Flexible API for LLM Inference-Time Scaling☆512Updated last month
- A better way of testing, inspecting, and analyzing AI Agent traces.☆40Updated last week
- A multi-agent LLM system for detecting and resolving cognitive dissonance.☆271Updated 3 months ago
- ☆36Updated 11 months ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆256Updated last week
- ☆195Updated 5 months ago
- ☆85Updated 4 months ago
- Test Generation for Prompts☆148Updated this week
- Harbor is a framework for running agent evaluations and creating and using RL environments.☆381Updated this week
- Public repository containing METR's DVC pipeline for eval data analysis☆183Updated this week
- Code to accompany the Universal Deep Research paper (https://arxiv.org/abs/2509.00244)☆458Updated 4 months ago
- Catch MCP server issues before your agents do.☆138Updated last month
- An alignment auditing agent capable of quickly exploring alignment hypothesis☆804Updated last week
- 🧬 The Huxley-Gödel Machine☆317Updated last month
- a Python library that uses Reinforcement Learning (RL) to train LLMs.☆42Updated 5 months ago
- Super basic implementation (gist-like) of RLMs with REPL environments.☆435Updated last week
- A clean, modular SDK for building AI agents with OpenHands V1.☆432Updated this week
- [NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents☆514Updated last week
- Run Surfer-H agents powered by Holo1 using the Surfer-H-CLI. Includes example tasks, scripts, and configurations.☆146Updated last month
- Coding problems used in aider's polyglot benchmark☆199Updated last year
- The theory of mind module for the SWE agent☆67Updated this week
- DSPy module for OpenAI Codex SDK - signature-driven agentic workflows☆148Updated last month
- ☆148Updated 5 months ago
- Verifiers for LLM Reinforcement Learning☆80Updated 4 months ago