CognitionAI / devin-swebench-resultsLinks
Cognition's results and methodology on SWE-bench
☆122Updated last year
Alternatives and similar repositories for devin-swebench-results
Users that are interested in devin-swebench-results are comparing it to the libraries listed below
Sorting:
- Harness used to benchmark aider against SWE Bench benchmarks☆78Updated last year
- Run SWE-bench evaluations remotely☆44Updated 3 months ago
- Just a bunch of benchmark logs for different LLMs☆119Updated last year
- Mixing Language Models with Self-Verification and Meta-Verification☆110Updated 11 months ago
- Pre-training code for CrystalCoder 7B LLM☆55Updated last year
- WebLINX is a benchmark for building web navigation agents with conversational capabilities☆156Updated 9 months ago
- ☆126Updated last year
- ☆85Updated 2 years ago
- ☆41Updated last year
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆225Updated this week
- A codebase for "Language Models can Solve Computer Tasks"☆237Updated last year
- Accepted by Transactions on Machine Learning Research (TMLR)☆136Updated last year
- Evaluating LLMs with CommonGen-Lite☆93Updated last year
- Public Inflection Benchmarks☆68Updated last year
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898☆230Updated last year
- [ICML 2023] "Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation", Wenqing Zheng, S P Sharan, Ajay Kumar Jaiswal, …☆43Updated 2 years ago
- LLM reads a paper and produce a working prototype☆60Updated 7 months ago
- This repository contains all the code for collecting large scale amounts of code from GitHub.☆110Updated 2 years ago
- A set of utilities for running few-shot prompting experiments on large-language models☆126Updated 2 years ago
- ☆102Updated last year
- 🔧 Compare how Agent systems perform on several benchmarks. 📊🚀☆102Updated 4 months ago
- Complex question answering in LLMs with enhanced reasoning and information-seeking capabilities.☆203Updated 2 years ago
- ☆126Updated 6 months ago
- Beating the GAIA benchmark with Transformers Agents. 🚀☆138Updated 9 months ago
- Multimodal computer agent data collection program☆152Updated 2 years ago
- Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents☆131Updated last year
- Reasoning by Communicating with Agents☆29Updated 7 months ago
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"☆117Updated last month
- Track the progress of LLM context utilisation☆55Updated 7 months ago
- ☆102Updated last year