microsoft / debug-gym
A Text-Based Environment for Interactive Debugging
☆199Updated this week
Alternatives and similar repositories for debug-gym
Users that are interested in debug-gym are comparing it to the libraries listed below
Sorting:
- ☆65Updated 2 months ago
- ☆125Updated last month
- ☆36Updated 3 months ago
- Verdict is a library for scaling judge-time compute.☆211Updated 2 weeks ago
- A better way of testing, inspecting, and analyzing AI Agent traces.☆35Updated this week
- Official repository for "DynaSaur: Large Language Agents Beyond Predefined Actions"☆339Updated 4 months ago
- ⚖️ Awesome LLM Judges ⚖️☆97Updated 2 weeks ago
- The Granite Guardian models are designed to detect risks in prompts and responses.☆81Updated last month
- ☆86Updated 4 months ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆65Updated last month
- Train your own SOTA deductive reasoning model☆92Updated 2 months ago
- Conduct in-depth research with AI-driven insights : DeepDive is a command-line tool that leverages web searches and AI models to generate…☆42Updated 8 months ago
- Coding problems used in aider's polyglot benchmark☆115Updated 4 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆172Updated 4 months ago
- 🤖 Headless IDE for AI agents☆186Updated 3 weeks ago
- ☆49Updated last month
- ☆54Updated 3 months ago
- ☆77Updated 6 months ago
- ☆138Updated last month
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆173Updated 2 months ago
- An agent benchmark with tasks in a simulated software company.☆350Updated this week
- Letting Claude Code develop his own MCP tools :)☆100Updated 2 months ago
- A simple MLX implementation for pretraining LLMs on Apple Silicon.☆75Updated 2 weeks ago
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆182Updated this week
- Minimal example of MCP for parsing llms.txt☆38Updated last month
- An MCP Server that's also an MCP Client. Useful for letting Claude develop and test MCPs without needing to reset the application.☆117Updated 2 months ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆31Updated 3 weeks ago
- XTR/WARP (SIGIR'25) is an extremely fast and accurate retrieval engine based on Stanford's ColBERTv2/PLAID and Google DeepMind's XTR.☆129Updated 2 weeks ago
- ☆151Updated 5 months ago
- ☆78Updated this week