Aider-AI / refactor-benchmarkLinks
Aider's refactoring benchmark exercises based on popular python repos
☆74Updated 8 months ago
Alternatives and similar repositories for refactor-benchmark
Users that are interested in refactor-benchmark are comparing it to the libraries listed below
Sorting:
- Harness used to benchmark aider against SWE Bench benchmarks☆72Updated 11 months ago
- ☆158Updated 9 months ago
- proof-of-concept of Cursor's Instant Apply feature☆82Updated 9 months ago
- Coding problems used in aider's polyglot benchmark☆141Updated 6 months ago
- Agent computer interface for AI software engineer.☆85Updated this week
- Simple Graph Memory for AI applications☆86Updated last month
- Just a bunch of benchmark logs for different LLMs☆119Updated 10 months ago
- ☆157Updated 11 months ago
- Evaluating tool-augmented LLMs in conversation settings☆85Updated last year
- Writing Blog Posts with Generative Feedback Loops!☆48Updated last year
- an implementation of Self-Extend, to expand the context window via grouped attention☆119Updated last year
- auto fine tune of models with synthetic data☆75Updated last year
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆228Updated this week
- ☆22Updated 11 months ago
- ☆49Updated last year
- ☆73Updated last year
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆81Updated 8 months ago
- ☆17Updated 5 months ago
- Client Code Examples, Use Cases and Benchmarks for Enterprise h2oGPTe RAG-Based GenAI Platform☆87Updated 2 weeks ago
- ☆86Updated 2 weeks ago
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆80Updated 3 months ago
- A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you…☆73Updated 6 months ago
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆91Updated 5 months ago
- ☆84Updated 2 years ago
- Train your own SOTA deductive reasoning model☆94Updated 3 months ago
- Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions☆42Updated 10 months ago
- Enhanced fork of SWE-bench, tailored for OpenDevin's ecosystem.☆25Updated last year
- Beating the GAIA benchmark with Transformers Agents. 🚀☆123Updated 4 months ago
- Official Repo for The Paper "Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems"☆54Updated 4 months ago
- Function Calling Benchmark & Testing☆87Updated 11 months ago