CognitionAI / devin-swebench-resultsLinks

Cognition's results and methodology on SWE-bench

☆120

Alternatives and similar repositories for devin-swebench-results

Users that are interested in devin-swebench-results are comparing it to the libraries listed below

Sorting:

teknium1 / LLM-Benchmark-Logs
Just a bunch of benchmark logs for different LLMs
☆118Updated last year
Aider-AI / aider-swe-bench
Harness used to benchmark aider against SWE Bench benchmarks
☆75Updated last year
automix-llm / automix
Mixing Language Models with Self-Verification and Meta-Verification
☆109Updated 10 months ago
togethercomputer / Llama-2-7B-32K-Instruct
☆85Updated 2 years ago
InflectionAI / Inflection-Benchmarks
Public Inflection Benchmarks
☆68Updated last year
LLM360 / crystalcoder-train
Pre-training code for CrystalCoder 7B LLM
☆55Updated last year
VITA-Group / ChainCoder
[ICML 2023] "Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation", Wenqing Zheng, S P Sharan, Ajay Kumar Jaiswal, …
☆42Updated last year
allenai / CommonGen-Eval
Evaluating LLMs with CommonGen-Lite
☆91Updated last year
princeton-nlp / intercode
[NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898
☆226Updated last year
posgnu / rci-agent
A codebase for "Language Models can Solve Computer Tasks"
☆235Updated last year
reasoning-machines / prompt-lib
A set of utilities for running few-shot prompting experiments on large-language models
☆123Updated last year
SWE-bench / experiments
Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.
☆218Updated this week
asappresearch / webagents-step
☆41Updated last year
evalplus / repoqa
RepoQA: Evaluating Long-Context Code Understanding
☆119Updated 11 months ago
Anni-Zou / Meta-CoT
Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models
☆99Updated 2 years ago
young-geng / koala_data_pipeline
The data processing pipeline for the Koala chatbot language model
☆118Updated 2 years ago
SWE-bench / sb-cli
Run SWE-bench evaluations remotely
☆41Updated 2 months ago
bigcode-project / bigcode-analysis
Repository for analysis and experiments in the BigCode project.
☆124Updated last year
aorwall / SWE-bench-docker
☆99Updated last year
CarperAI / Code-Pile
This repository contains all the code for collecting large scale amounts of code from GitHub.
☆109Updated 2 years ago
Zyq-scut / RLTF
Accepted by Transactions on Machine Learning Research (TMLR)
☆132Updated last year
neulab / gemini-benchmark
☆149Updated last year
lm-sys / llm-decontaminator
Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"
☆311Updated last year
kyegomez / Algorithm-Of-Thoughts
My implementation of "Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models"
☆98Updated 2 years ago
gabegrand / lilo
LILO: Library Induction with Language Observations
☆88Updated last year
Ag2S1 / Sibyl-System
☆125Updated last year
xingyaoww / LeTI
Official repo for NAACL 2024 Findings paper "LeTI: Learning to Generate from Textual Interactions."
☆64Updated 2 years ago
GammaTauAI / leetcode-hard-gym
A hard gym for programming
☆161Updated last year
princeton-nlp / USACO
Can Language Models Solve Olympiad Programming?
☆118Updated 9 months ago
AutoLLM / AutoAgents
Complex question answering in LLMs with enhanced reasoning and information-seeking capabilities.
☆201Updated last year