NL2Code / CodeR
☆152Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for CodeR
- ☆264Updated this week
- Enhancing AI Software Engineering with Repository-level Code Graph☆94Updated 2 months ago
- Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"☆212Updated last month
- ☆81Updated 4 months ago
- Environments, tools, and benchmarks for general computer agents☆172Updated 3 weeks ago
- ☆103Updated 3 months ago
- Building Open LLM Web Agents with Self-Evolving Online Curriculum RL☆204Updated this week
- [ACL 2024] AUTOACT: Automatic Agent Learning from Scratch for QA via Self-Planning☆178Updated last month
- Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents☆110Updated 5 months ago
- An implemtation of Everyting of Thoughts (XoT).☆132Updated 8 months ago
- ☆116Updated 5 months ago
- Code for Husky, an open-source language agent that solves complex, multi-step reasoning tasks. Husky v1 addresses numerical, tabular and …☆328Updated 5 months ago
- 🤠 Agent-as-a-Judge and DevAI dataset☆192Updated this week
- AWM: Agent Workflow Memory☆205Updated last month
- CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents. https://crab.camel-ai.org/☆191Updated last week
- CodeRAG-Bench: Can Retrieval Augment Code Generation?☆84Updated this week
- My implementation of "Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models"☆92Updated last year
- The official implementation of paper "ToolGen: Unified Tool Retrieval and Calling via Generation"☆99Updated 3 weeks ago
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆64Updated this week
- Official implementation of paper "On the Diagram of Thought" (https://arxiv.org/abs/2409.10038)☆170Updated last month
- An Analytical Evaluation Board of Multi-turn LLM Agents☆250Updated 6 months ago
- ☆287Updated 2 months ago
- Harness used to benchmark aider against SWE Bench benchmarks☆53Updated 4 months ago
- ☆146Updated 3 months ago
- Open Source WizardCoder Dataset☆153Updated last year
- This is the official repo for "PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization". PromptAgen…☆204Updated 3 months ago
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆100Updated this week
- FireAct: Toward Language Agent Fine-tuning☆255Updated last year
- ☆127Updated 3 months ago
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898☆194Updated 6 months ago