code-rag-bench / code-rag-bench
CodeRAG-Bench: Can Retrieval Augment Code Generation?
☆54Updated 2 months ago
Related projects: ⓘ
- Repoformer: Selective Retrieval for Repository-Level Code Completion (ICML 2024)☆33Updated 2 months ago
- ☆48Updated 3 months ago
- ☆39Updated 3 months ago
- ☆25Updated last week
- Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…☆100Updated 3 months ago
- A reading list on LLM based Synthetic Data Generation 🔥☆105Updated last month
- ☆52Updated 2 months ago
- A Comprehensive Benchmark for Software Development.☆84Updated 3 months ago
- Source code for xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token☆80Updated 2 months ago
- CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion (NeurIPS 2023)☆114Updated last month
- ☆45Updated 2 months ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆45Updated 6 months ago
- Code and data for "MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models"☆28Updated 6 months ago
- [ACL 2024] AUTOACT: Automatic Agent Learning from Scratch for QA via Self-Planning☆162Updated 5 months ago
- We have released the code and demo program required for LLM with self-verification☆45Updated 11 months ago
- InstructCoder (former name:Codelnstruct) enables LLMs to edit code☆47Updated 6 months ago
- ☆17Updated 3 months ago
- [NAACL 2024 Outstanding Paper] Source code for the NAACL 2024 paper entitled "R-Tuning: Instructing Large Language Models to Say 'I Don't…☆82Updated 2 months ago
- An Evolving Code Generation Benchmark Aligned with Real-world Code Repositories☆38Updated last month
- ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios☆63Updated 5 months ago
- ☆32Updated 2 weeks ago
- Awesome LLM Self-Consistency: a curated list of Self-consistency in Large Language Models☆54Updated last month
- Source code of DRAGIN, ACL 2024 main conference Long Paper☆60Updated this week
- A new tool learning benchmark aiming at well-balanced stability and reality, based on ToolBench.☆101Updated this week
- [ACL2024] Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios☆36Updated 5 months ago
- ☆170Updated last month
- Repo-Level Code generation papers☆66Updated 3 months ago
- The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".☆52Updated 2 months ago
- Official github repo for AutoDetect, an automated weakness detection framework for LLMs.☆36Updated 2 months ago
- [LREC-COLING'24] HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization☆25Updated this week