LingmaTongyi / Codev-Bench
Codev-Bench (Code Development Benchmark), a fine-grained, real-world, repository-level, and developer-centric evaluation framework. Codev-Bench assesses whether a code completion tool can accurately capture a developer's immediate intent and suggest appropriate code snippets across diverse, fine-grained contexts.
☆26Updated 2 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for Codev-Bench
- ☆14Updated 2 months ago
- [ACL 2024] The project of Symbol-LLM☆42Updated 4 months ago
- NaturalCodeBench (Findings of ACL 2024)☆56Updated last month
- Large Language Models Meet NL2Code: A Survey☆34Updated this week
- ☆39Updated 5 months ago
- CodeRAG-Bench: Can Retrieval Augment Code Generation?☆84Updated last week
- ☆101Updated 5 months ago
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆51Updated 3 weeks ago
- An Evolving Code Generation Benchmark Aligned with Real-world Code Repositories☆46Updated 3 months ago
- StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback☆56Updated 2 months ago
- ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios☆62Updated 7 months ago
- A Comprehensive Benchmark for Software Development.☆84Updated 5 months ago
- MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models☆18Updated last month
- Generate the WizardCoder Instruct from the CodeAlpaca☆20Updated last year
- The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".☆57Updated 4 months ago
- Training and Benchmarking LLMs for Code Preference.☆25Updated last week
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)☆77Updated last month
- InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw☆52Updated last month
- ☆79Updated 7 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆99Updated 3 weeks ago
- This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"☆39Updated 4 months ago
- Code for the TMLR 2023 paper "PPOCoder: Execution-based Code Generation using Deep Reinforcement Learning"☆97Updated 10 months ago
- CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion (NeurIPS 2023)☆122Updated 3 months ago
- ☆21Updated 5 months ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆49Updated 9 months ago
- Collection of papers for scalable automated alignment.☆73Updated last month
- Official implementation of paper How to Understand Whole Repository? New SOTA on SWE-bench Lite (21.3%)☆67Updated last week
- ☆26Updated 2 months ago
- Towards Systematic Measurement for Long Text Quality☆29Updated 2 months ago
- A distributed, extensible, secure solution for evaluating machine generated code with unit tests in multiple programming languages.☆42Updated last month