CognitionAI / devin-swebench-resultsLinks
Cognition's results and methodology on SWE-bench
☆120Updated last year
Alternatives and similar repositories for devin-swebench-results
Users that are interested in devin-swebench-results are comparing it to the libraries listed below
Sorting:
- Just a bunch of benchmark logs for different LLMs☆118Updated last year
- Harness used to benchmark aider against SWE Bench benchmarks☆75Updated last year
- Mixing Language Models with Self-Verification and Meta-Verification☆109Updated 10 months ago
- ☆85Updated 2 years ago
- Public Inflection Benchmarks☆68Updated last year
- Pre-training code for CrystalCoder 7B LLM☆55Updated last year
- [ICML 2023] "Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation", Wenqing Zheng, S P Sharan, Ajay Kumar Jaiswal, …☆42Updated last year
- Evaluating LLMs with CommonGen-Lite☆91Updated last year
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898☆226Updated last year
- A codebase for "Language Models can Solve Computer Tasks"☆235Updated last year
- A set of utilities for running few-shot prompting experiments on large-language models☆123Updated last year
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆218Updated this week
- ☆41Updated last year
- RepoQA: Evaluating Long-Context Code Understanding☆119Updated 11 months ago
- Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models☆99Updated 2 years ago
- The data processing pipeline for the Koala chatbot language model☆118Updated 2 years ago
- Run SWE-bench evaluations remotely☆41Updated 2 months ago
- Repository for analysis and experiments in the BigCode project.☆124Updated last year
- ☆99Updated last year
- This repository contains all the code for collecting large scale amounts of code from GitHub.☆109Updated 2 years ago
- Accepted by Transactions on Machine Learning Research (TMLR)☆132Updated last year
- ☆149Updated last year
- Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"☆311Updated last year
- My implementation of "Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models"☆98Updated 2 years ago
- LILO: Library Induction with Language Observations☆88Updated last year
- ☆125Updated last year
- Official repo for NAACL 2024 Findings paper "LeTI: Learning to Generate from Textual Interactions."☆64Updated 2 years ago
- A hard gym for programming☆161Updated last year
- Can Language Models Solve Olympiad Programming?☆118Updated 9 months ago
- Complex question answering in LLMs with enhanced reasoning and information-seeking capabilities.☆201Updated last year