NL2Code / CodeR
☆155Updated 6 months ago
Alternatives and similar repositories for CodeR:
Users that are interested in CodeR are comparing it to the libraries listed below
- ☆87Updated 8 months ago
- Enhancing AI Software Engineering with Repository-level Code Graph☆146Updated 2 months ago
- ☆367Updated last month
- [ACL 2024] AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning☆215Updated 2 months ago
- Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents☆121Updated 9 months ago
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆155Updated 2 weeks ago
- ☆72Updated 2 months ago
- AWM: Agent Workflow Memory☆252Updated last month
- My implementation of "Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models"☆98Updated last year
- Official implementation of paper How to Understand Whole Repository? New SOTA on SWE-bench Lite (21.3%)☆72Updated 4 months ago
- CodeRAG-Bench: Can Retrieval Augment Code Generation?☆119Updated 4 months ago
- Official implementation of paper "On the Diagram of Thought" (https://arxiv.org/abs/2409.10038)☆177Updated last week
- ☆120Updated 9 months ago
- An implemtation of Everyting of Thoughts (XoT).☆141Updated last year
- ☆80Updated last month
- ☆117Updated 7 months ago
- ☆90Updated 6 months ago
- Harness used to benchmark aider against SWE Bench benchmarks☆67Updated 8 months ago
- A Comprehensive Benchmark for Software Development.☆100Updated 9 months ago
- Beating the GAIA benchmark with Transformers Agents. 🚀☆103Updated last month
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898☆210Updated 10 months ago
- RepoQA: Evaluating Long-Context Code Understanding☆106Updated 4 months ago
- Open Source WizardCoder Dataset☆156Updated last year
- Gödel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement☆80Updated last month
- 🔧 Compare how Agent systems perform on several benchmarks. 📊🚀☆90Updated 5 months ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆83Updated this week
- Data and evaluation scripts for "CodePlan: Repository-level Coding using LLMs and Planning", FSE 2024☆65Updated 6 months ago
- From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging☆71Updated 5 months ago
- ✨ RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems - ICLR 2024☆148Updated 7 months ago
- An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]☆291Updated 10 months ago