SWE-bench / swe-bench.github.ioLinks
Landing page + leaderboard for SWE-Bench benchmark
☆6Updated this week
Alternatives and similar repositories for swe-bench.github.io
Users that are interested in swe-bench.github.io are comparing it to the libraries listed below
Sorting:
- Moatless Testbeds allows you to create isolated testbed environments in a Kubernetes cluster where you can apply code changes through git…☆12Updated last month
- OLMost every training recipe you need to perform data interventions with the OLMo family of models.☆30Updated this week
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆32Updated last month
- ☆19Updated 7 months ago
- Aioli: A unified optimization framework for language model data mixing☆27Updated 4 months ago
- ☆16Updated 5 months ago
- [ICML 2023] "Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation", Wenqing Zheng, S P Sharan, Ajay Kumar Jaiswal, …☆40Updated last year
- Code for experiments on self-prediction as a way to measure introspection in LLMs☆13Updated 5 months ago
- ☆26Updated 10 months ago
- Harness used to benchmark aider against SWE Bench benchmarks☆72Updated 11 months ago
- Training and Benchmarking LLMs for Code Preference.☆33Updated 6 months ago
- Code for the paper: CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models☆21Updated 2 months ago
- ☆17Updated last month
- Exploration using DSPy to optimize modules to maximize performance on the OpenToM dataset☆16Updated last year
- [NeurIPS 2024] Evaluation harness for SWT-Bench, a benchmark for evaluating LLM repository-level test-generation☆49Updated last month
- Training hybrid models for dummies.☆21Updated 4 months ago
- ☆50Updated last week
- Small, simple agent task environments for training and evaluation☆18Updated 7 months ago
- CodeNav is an LLM agent that navigates and leverages previously unseen code repositories to solve user queries.☆48Updated 9 months ago
- ☆83Updated last month
- ☆21Updated 3 weeks ago
- Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions☆42Updated 10 months ago
- ☆29Updated last year
- Agentless Lite: RAG-based SWE-Bench software engineering scaffold☆29Updated last month
- A framework for evaluating the effectiveness of chain-of-thought reasoning in language models.☆17Updated 4 months ago
- ☆25Updated this week
- LangCode - Improving alignment and reasoning of large language models (LLMs) with natural language embedded program (NLEP).☆42Updated last year
- Lottery Ticket Adaptation☆39Updated 6 months ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆34Updated last year
- Advanced Reasoning Benchmark Dataset for LLMs☆46Updated last year