microsoft / debug-gym
A Text-Based Environment for Interactive Debugging
☆14Updated this week
Alternatives and similar repositories for debug-gym:
Users that are interested in debug-gym are comparing it to the libraries listed below
- A better way of testing, inspecting, and analyzing AI Agent traces.☆30Updated this week
- A text-to-SQL prototype on the northwind sqlite dataset☆12Updated 6 months ago
- LLM plugin for clustering embeddings☆72Updated last year
- Access the Cohere Command R family of models☆35Updated this week
- llm plugin for Cerebras fast inference API☆24Updated 3 weeks ago
- The developper starter pack for document processing☆14Updated this week
- Automated Capability Discovery via Foundation Model Self-Exploration☆42Updated last month
- An AI character interaction system with emotional modeling and advanced memory management☆16Updated 5 months ago
- Make tool-calling schemas for existing tools☆14Updated 3 weeks ago
- Let Claude control a web browser on your machine.☆18Updated last month
- LLM code editor for backend services☆14Updated 5 months ago
- The LLM plugins directory☆40Updated last year
- Demos of ChatGPT's function calling/structured data support.☆23Updated last year
- Plugin for LLM adding support for Google's PaLM 2 model☆14Updated last year
- ☆30Updated last year
- NLP with Rust for Python 🦀🐍☆61Updated 10 months ago
- Chat Markup Language conversation library☆55Updated last year
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆50Updated 3 weeks ago
- Embedding models from Jina AI☆58Updated last year
- Contains the model patches and the eval logs from the passing swe-bench-lite run.☆10Updated 9 months ago
- RAG for any docs hosted on readthedocs☆38Updated last year
- ☆17Updated last week
- Run evals using LLM☆22Updated 11 months ago
- A collection of tools for your LLMs that run on Modal☆16Updated last month
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆53Updated 3 months ago
- Proxy server that converts Anthropic API requests to OpenAI format and sends it to OpenRouter. It's used to use Claude Code with OpenRout…☆53Updated 2 weeks ago
- The Granite Guardian models are designed to detect risks in prompts and responses.☆75Updated 2 weeks ago
- Conduct in-depth research with AI-driven insights : DeepDive is a command-line tool that leverages web searches and AI models to generate…☆39Updated 7 months ago
- Using modal.com to process FineWeb-edu data☆20Updated 3 weeks ago
- Very minimal (and stateless) agent framework☆41Updated 2 months ago