CognitionAI / devin-swebench-resultsLinks
Cognition's results and methodology on SWE-bench
☆119Updated last year
Alternatives and similar repositories for devin-swebench-results
Users that are interested in devin-swebench-results are comparing it to the libraries listed below
Sorting:
- Harness used to benchmark aider against SWE Bench benchmarks☆71Updated last year
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆206Updated this week
- accompanying material for sleep-time compute paper☆105Updated 3 months ago
- Just a bunch of benchmark logs for different LLMs☆120Updated last year
- ☆41Updated last year
- Track the progress of LLM context utilisation☆55Updated 4 months ago
- Public repository containing METR's DVC pipeline for eval data analysis☆104Updated 4 months ago
- ☆108Updated 2 months ago
- A codebase for "Language Models can Solve Computer Tasks"☆234Updated last year
- Run SWE-bench evaluations remotely☆40Updated 2 weeks ago
- Code for the paper 🌳 Tree Search for Language Model Agents☆212Updated last year
- This repository contains all the code for collecting large scale amounts of code from GitHub.☆110Updated 2 years ago
- ☆85Updated 2 years ago
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898☆222Updated last year
- ☆112Updated 3 months ago
- Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models☆97Updated last year
- Accepted by Transactions on Machine Learning Research (TMLR)☆130Updated 10 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆105Updated 8 months ago
- Google Deepmind's PromptBreeder for automated prompt engineering implemented in langchain expression language.☆142Updated last year
- Pre-training code for CrystalCoder 7B LLM☆55Updated last year
- ☆55Updated 2 months ago
- RepoQA: Evaluating Long-Context Code Understanding☆115Updated 9 months ago
- Code and data for the paper "Why think step by step? Reasoning emerges from the locality of experience"☆61Updated 4 months ago
- [ICML 2023] "Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation", Wenqing Zheng, S P Sharan, Ajay Kumar Jaiswal, …☆40Updated last year
- CodeGen2 models for program synthesis☆271Updated 2 years ago
- Prototype advanced LLM algorithms for reasoning and planning.☆96Updated last year
- LILO: Library Induction with Language Observations☆88Updated last year
- ☆159Updated last year
- Multimodal computer agent data collection program☆145Updated last year
- Evaluating tool-augmented LLMs in conversation settings☆86Updated last year