FloridSleeves / LLMDebuggerLinks

LDB: A Large Language Model Debugger via Verifying Runtime Execution Step by Step (ACL'24)

☆554

Alternatives and similar repositories for LLMDebugger

Users that are interested in LLMDebugger are comparing it to the libraries listed below

Sorting:

huangd1999 / AgentCoder
This Repo is the official implementation of AgentCoder and AgentCoder+.
☆340Updated 5 months ago
aorwall / moatless-tools
☆533Updated last month
lapisrocks / LanguageAgentTreeSearch
[ICML 2024] Official repository for "Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models"
☆768Updated last year
OpenAutoCoder / Agentless
Agentless🐱: an agentless approach to automatically solve software development problems
☆1,846Updated 7 months ago
Md-Ashraful-Pramanik / MapCoder
MapCoder: Multi-Agent Code Generation for Competitive Problem Solving
☆160Updated 5 months ago
ozyyshr / RepoGraph
Enhancing AI Software Engineering with Repository-level Code Graph
☆197Updated 4 months ago
facebookresearch / swe-rl
Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"
☆573Updated 4 months ago
bigcode-project / bigcodebench
[ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI
☆409Updated 3 months ago
SalesforceAIResearch / xLAM
xLAM: A Family of Large Action Models to Empower AI Agent Systems
☆513Updated this week
microsoft / CodeT
☆661Updated 9 months ago
microsoft / Trace
End-to-end Generative Optimization for AI Agents
☆633Updated last month
SWE-Gym / SWE-Gym
Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]
☆516Updated last week
LiveCodeBench / LiveCodeBench
Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"
☆608Updated 3 weeks ago
SalesforceAIResearch / AgentLite
☆618Updated 6 months ago
bigcode-project / selfcodealign
[NeurIPS'24] SelfCodeAlign: Self-Alignment for Code Generation
☆309Updated 5 months ago
sierra-research / tau-bench
Code and Data for Tau-Bench
☆713Updated 3 weeks ago
SWE-agent / SWE-ReX
Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.
☆273Updated last week
SWE-bench / experiments
Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.
☆201Updated 3 weeks ago
SWE-bench / SWE-smith
Scaling Data for SWE-agents
☆328Updated this week
abacaj / code-eval
Run evaluation on LLMs using human-eval benchmark
☆417Updated last year
zorazrw / agent-workflow-memory
AWM: Agent Workflow Memory
☆300Updated 6 months ago
r2e-project / r2e
r2e: turn any github repository into a programming agent environment
☆129Updated 3 months ago
Leolty / repobench
✨ RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems - ICLR 2024
☆169Updated 11 months ago
NL2Code / CodeR
☆159Updated 11 months ago
TheAgentCompany / TheAgentCompany
An agent benchmark with tasks in a simulated software company.
☆515Updated last week
code-rag-bench / code-rag-bench
CodeRAG-Bench: Can Retrieval Augment Code Generation?
☆150Updated 8 months ago
xlang-ai / DS-1000
[ICML 2023] Data and code release for the paper "DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation".
☆251Updated 9 months ago
trotsky1997 / MathBlackBox
☆1,028Updated 7 months ago
ServiceNow / AgentLab
AgentLab: An open-source framework for developing, testing, and benchmarking web agents on diverse tasks, designed for scalability and re…
☆372Updated this week
bigcode-project / octopack
🐙 OctoPack: Instruction Tuning Code Large Language Models
☆472Updated 6 months ago