OpenDevin / OD-SWE-benchLinks
Enhanced fork of SWE-bench, tailored for OpenDevin's ecosystem.
☆25Updated last year
Alternatives and similar repositories for OD-SWE-bench
Users that are interested in OD-SWE-bench are comparing it to the libraries listed below
Sorting:
- Harness used to benchmark aider against SWE Bench benchmarks☆72Updated 11 months ago
- Agent computer interface for AI software engineer.☆85Updated this week
- 🧠 Societies of Mind & Economy of Minds☆60Updated 3 months ago
- Moatless Testbeds allows you to create isolated testbed environments in a Kubernetes cluster where you can apply code changes through git…☆13Updated 2 months ago
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆26Updated 7 months ago
- ☆97Updated 11 months ago
- ToK aka Tree of Knowledge for Large Language Models LLM. It's a novel dataset that inspires knowledge symbolic correlation in simple inpu…☆54Updated 2 years ago
- ☆20Updated last year
- Aider's refactoring benchmark exercises based on popular python repos☆74Updated 8 months ago
- ☆158Updated 9 months ago
- My implementation of "Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models"☆98Updated last year
- [ACL25' Findings] SWE-Dev is an SWE agent with a scalable test case construction pipeline.☆40Updated last week
- LLM finetuning☆42Updated last year
- An open source ChatGPT UI for ToolLlama☆28Updated last year
- ☆36Updated 2 years ago
- Gentopia Agent Zoo and Agent Benchmark☆30Updated last year
- Data preparation code for CrystalCoder 7B LLM☆45Updated last year
- Document for XAgent.☆17Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆10Updated last year
- Contains the model patches and the eval logs from the passing swe-bench-lite run.☆10Updated 11 months ago
- 🔔🧠 Easily experiment with popular language agents across diverse reasoning/decision-making benchmarks!☆52Updated 3 months ago
- CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings☆42Updated 4 months ago
- Official Repo for The Paper "Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems"☆54Updated 4 months ago
- II-Thought-RL is our initial attempt at developing a large-scale, multi-domain Reinforcement Learning (RL) dataset☆20Updated 2 months ago
- Scaling Computer-Use Grounding via UI Decomposition and Synthesis☆79Updated this week
- ☆16Updated 5 months ago
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆108Updated 8 months ago
- ☆40Updated 11 months ago
- Safe Python Code Execution Environment for Language Models☆15Updated 2 months ago
- ☆121Updated 10 months ago