LingmaTongyi / Codev-Bench
Codev-Bench (Code Development Benchmark), a fine-grained, real-world, repository-level, and developer-centric evaluation framework. Codev-Bench assesses whether a code completion tool can accurately capture a developer's immediate intent and suggest appropriate code snippets across diverse, fine-grained contexts.
☆18Updated last week
Related projects ⓘ
Alternatives and complementary repositories for Codev-Bench
- ☆98Updated 5 months ago
- NaturalCodeBench (Findings of ACL 2024)☆56Updated last month
- ☆14Updated 2 months ago
- ☆39Updated 5 months ago
- ☆24Updated 2 months ago
- CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion (NeurIPS 2023)☆121Updated 3 months ago
- Training and Benchmarking LLMs for Code Preference.☆22Updated this week
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆48Updated 8 months ago
- Collection of papers for scalable automated alignment.☆72Updated 3 weeks ago
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)☆301Updated 3 weeks ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆96Updated 2 weeks ago
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆47Updated 2 weeks ago
- ☆85Updated 11 months ago
- [ACL 2024] Long-Context Language Modeling with Parallel Encodings☆143Updated 5 months ago
- ☆68Updated 4 months ago
- ☆78Updated 6 months ago
- ☆55Updated this week
- [ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset☆84Updated 4 months ago
- Code for ICLR 2024 paper "CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets"☆47Updated 5 months ago
- Generate the WizardCoder Instruct from the CodeAlpaca☆20Updated last year
- xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval☆74Updated last month
- InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw☆52Updated last month
- ☆144Updated 3 months ago
- Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models☆216Updated 2 months ago
- Code for the TMLR 2023 paper "PPOCoder: Execution-based Code Generation using Deep Reinforcement Learning"☆96Updated 10 months ago
- ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models☆166Updated last month
- ☆21Updated 5 months ago
- Counting-Stars (★)☆76Updated 2 months ago
- Astraios: Parameter-Efficient Instruction Tuning Code Language Models☆57Updated 7 months ago
- Official implementation of paper How to Understand Whole Repository? New SOTA on SWE-bench Lite (21.3%)☆65Updated 5 months ago