OpenDevin / OD-SWE-bench
Enhanced fork of SWE-bench, tailored for OpenDevin's ecosystem.
☆20Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for OD-SWE-bench
- Aider's refactoring benchmark exercises based on popular python repos☆45Updated last month
- Contains the model patches and the eval logs from the passing swe-bench-lite run.☆10Updated 4 months ago
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆75Updated last month
- Harness used to benchmark aider against SWE Bench benchmarks☆53Updated 4 months ago
- Data preparation code for CrystalCoder 7B LLM☆42Updated 6 months ago
- ☆37Updated 11 months ago
- ☆82Updated 4 months ago
- A desktop for AI agents☆28Updated this week
- A collection of pre-build wrappers over common RAG systems like ChromaDB, Weaviate, Pinecone, and othersz!☆20Updated last week
- 🔔🧠 Easily experiment with popular language agents across diverse reasoning/decision-making benchmarks!☆47Updated this week
- ☆78Updated 11 months ago
- Tools for formatting large language model prompts.☆12Updated 11 months ago
- ☆20Updated 8 months ago
- ☆37Updated this week
- Pre-training code for CrystalCoder 7B LLM☆53Updated 6 months ago
- Evaluating tool-augmented LLMs in conversation settings☆72Updated 5 months ago
- ☆11Updated last year
- Nexusflow function call, tool use, and agent benchmarks.☆14Updated this week
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆23Updated last week
- Generate High Quality textual or multi-modal datasets with Agents☆17Updated last year
- o1 Chain of Thought Examples☆13Updated last month
- ☆153Updated 2 months ago
- LangChain + LiteLLM that works☆25Updated 3 weeks ago
- Data and evaluation scripts for "CodePlan: Repository-level Coding using LLMs and Planning", FSE 2024☆52Updated 2 months ago
- ☆72Updated last year
- never forget anything again! combine AI and intelligent tooling for a local knowledge base to track catalogue, annotate, and plan for you…☆32Updated 6 months ago
- ToK aka Tree of Knowledge for Large Language Models LLM. It's a novel dataset that inspires knowledge symbolic correlation in simple inpu…☆46Updated last year
- ☆24Updated 10 months ago
- ☆116Updated 5 months ago
- Gentopia Agent Zoo and Agent Benchmark☆29Updated last year