OpenDevin / OD-SWE-bench
Enhanced fork of SWE-bench, tailored for OpenDevin's ecosystem.
☆23Updated 8 months ago
Alternatives and similar repositories for OD-SWE-bench:
Users that are interested in OD-SWE-bench are comparing it to the libraries listed below
- Harness used to benchmark aider against SWE Bench benchmarks☆66Updated 7 months ago
- Aider's refactoring benchmark exercises based on popular python repos☆57Updated 4 months ago
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆25Updated 3 months ago
- Contains the model patches and the eval logs from the passing swe-bench-lite run.☆10Updated 7 months ago
- Agent computer interface for AI software engineer.☆36Updated this week
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31Updated 9 months ago
- Data preparation code for CrystalCoder 7B LLM☆44Updated 9 months ago
- Nexusflow function call, tool use, and agent benchmarks.☆19Updated 2 months ago
- RepoQA: Evaluating Long-Context Code Understanding☆102Updated 3 months ago
- Pre-training code for CrystalCoder 7B LLM☆55Updated 9 months ago
- A better way of testing, inspecting, and analyzing AI Agent traces.☆28Updated this week
- ☆50Updated 3 months ago
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆46Updated 8 months ago
- Small, simple agent task environments for training and evaluation☆18Updated 3 months ago
- LangChain + LiteLLM that works☆37Updated this week
- Enhancing AI Software Engineering with Repository-level Code Graph☆133Updated last month
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆89Updated 3 weeks ago
- ☆73Updated this week
- ☆34Updated 2 months ago
- LLM finetuning☆42Updated last year
- ☆81Updated last year
- Open Implementations of LLM Analyses☆98Updated 4 months ago
- ☆28Updated 10 months ago
- 🌟 SwarmAgent: A framework for simulating social group dynamics using multi-agent collaboration, aiding insights into collective behavior…☆10Updated last year
- Data and evaluation scripts for "CodePlan: Repository-level Coding using LLMs and Planning", FSE 2024☆62Updated 5 months ago
- LLMs as Collaboratively Edited Knowledge Bases☆44Updated last year
- ☆36Updated last year
- Self-hosted LLM chatbot arena, with yourself as the only judge☆36Updated last year