CognitionAI / devin-swebench-resultsLinks
Cognition's results and methodology on SWE-bench
☆123Updated last year
Alternatives and similar repositories for devin-swebench-results
Users that are interested in devin-swebench-results are comparing it to the libraries listed below
Sorting:
- Harness used to benchmark aider against SWE Bench benchmarks☆78Updated last year
- Just a bunch of benchmark logs for different LLMs☆119Updated last year
- Mixing Language Models with Self-Verification and Meta-Verification☆111Updated last year
- ☆85Updated 2 years ago
- ☆41Updated last year
- 🔧 Compare how Agent systems perform on several benchmarks. 📊🚀☆102Updated 4 months ago
- Evaluating LLMs with CommonGen-Lite☆93Updated last year
- [ICML 2023] "Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation", Wenqing Zheng, S P Sharan, Ajay Kumar Jaiswal, …☆43Updated 2 years ago
- Track the progress of LLM context utilisation☆55Updated 8 months ago
- Run SWE-bench evaluations remotely☆47Updated 4 months ago
- Pre-training code for CrystalCoder 7B LLM☆55Updated last year
- ☆128Updated 6 months ago
- A set of utilities for running few-shot prompting experiments on large-language models☆126Updated 2 years ago
- A codebase for "Language Models can Solve Computer Tasks"☆239Updated last year
- Public Inflection Benchmarks☆68Updated last year
- Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models☆100Updated 2 years ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆66Updated last year
- This repository contains all the code for collecting large scale amounts of code from GitHub.☆110Updated 2 years ago
- ☆126Updated last year
- ☆130Updated 7 months ago
- Reasoning by Communicating with Agents☆29Updated 7 months ago
- A hard gym for programming☆163Updated last year
- A system that tries to resolve all issues on a github repo with OpenHands.☆117Updated last year
- Official repo for NAACL 2024 Findings paper "LeTI: Learning to Generate from Textual Interactions."☆66Updated 2 years ago
- Multimodal computer agent data collection program☆153Updated 3 weeks ago
- accompanying material for sleep-time compute paper☆118Updated 7 months ago
- ☆102Updated last year
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆107Updated 3 months ago
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"☆118Updated 2 months ago
- Complex question answering in LLMs with enhanced reasoning and information-seeking capabilities.☆204Updated 2 years ago