microsoft / debug-gymLinks
A Text-Based Environment for Interactive Debugging
☆293Updated this week
Alternatives and similar repositories for debug-gym
Users that are interested in debug-gym are comparing it to the libraries listed below
Sorting:
- ☆80Updated 4 months ago
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆424Updated last week
- A better way of testing, inspecting, and analyzing AI Agent traces.☆46Updated 3 weeks ago
- An alignment auditing agent capable of quickly exploring alignment hypothesis☆874Updated this week
- Agent computer interface for AI software engineer.☆115Updated last month
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆261Updated this week
- ☆237Updated 2 months ago
- Public repository containing METR's DVC pipeline for eval data analysis☆189Updated last week
- Benchmark and optimize LLM inference across frameworks with ease☆161Updated 4 months ago
- Verifiers for LLM Reinforcement Learning☆81Updated 4 months ago
- Coding problems used in aider's polyglot benchmark☆199Updated last year
- Official CLI and Python SDK for Prime Intellect - access GPU compute, remote sandboxes, RL environments, and distributed training infrast…☆148Updated this week
- The theory of mind module for the SWE agent☆71Updated 3 weeks ago
- [NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents☆538Updated this week
- SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?☆251Updated 3 weeks ago
- Harbor is a framework for running agent evaluations and creating and using RL environments.☆542Updated this week
- Test Generation for Prompts☆149Updated 2 weeks ago
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆90Updated last month
- Code to accompany the Universal Deep Research paper (https://arxiv.org/abs/2509.00244)☆459Updated 5 months ago
- ☆59Updated last year
- A Tree Search Library with Flexible API for LLM Inference-Time Scaling☆515Updated last month
- A clean, modular SDK for building AI agents with OpenHands V1.☆476Updated this week
- Catch MCP server issues before your agents do.☆142Updated last month
- ☆43Updated 2 months ago
- This repository contains the toolkit for replicating results from our technical report.☆200Updated 5 months ago
- A system that tries to resolve all issues on a github repo with OpenHands.☆117Updated last year
- a Python library that uses Reinforcement Learning (RL) to train LLMs.☆42Updated 6 months ago
- Code for the paper "Coding Agents with Multimodal Browsing are Generalist Problem Solvers"☆97Updated 3 months ago
- ☆85Updated 5 months ago
- ☆106Updated last year