LingmaTongyi / Codev-BenchLinks

Codev-Bench (Code Development Benchmark), a fine-grained, real-world, repository-level, and developer-centric evaluation framework. Codev-Bench assesses whether a code completion tool can accurately capture a developer's immediate intent and suggest appropriate code snippets across diverse, fine-grained contexts.

☆46

Alternatives and similar repositories for Codev-Bench

Users that are interested in Codev-Bench are comparing it to the libraries listed below

Sorting:

multi-swe-bench / multi-swe-bench
Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving
☆210Updated this week
thunlp / DebugBench
The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".
☆78Updated last year
amazon-science / cceval
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion (NeurIPS 2023)
☆149Updated 11 months ago
THUDM / NaturalCodeBench
NaturalCodeBench (Findings of ACL 2024)
☆67Updated 9 months ago
open-compass / DevEval
A Comprehensive Benchmark for Software Development.
☆111Updated last year
Leolty / repobench
✨ RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems - ICLR 2024
☆169Updated 11 months ago
YerbaPage / Awesome-Repo-Level-Code-Generation
Must-read papers on Repository-level Code Generation & Issue Resolution 🔥
☆122Updated last week
RepoUnderstander / RepoUnderstander
Official implementation of paper How to Understand Whole Repository? New SOTA on SWE-bench Lite (21.3%)
☆86Updated 3 months ago
LingmaTongyi / Lingma-SWE-GPT
Inference code of Lingma SWE-GPT
☆233Updated 7 months ago
MCEVAL / McEval
☆43Updated 7 months ago
CodeEditorBench / CodeEditorBench
☆49Updated last year
facebookresearch / cruxeval
CRUXEval: Code Reasoning, Understanding, and Execution Evaluation
☆149Updated 9 months ago
ganler / code-r1
Reproducing R1 for Code with Reliable Rewards
☆237Updated 2 months ago
R2E-Gym / R2E-Gym
Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents
☆121Updated last week
DIRECT-BIT / SRA-MCTS
☆33Updated last month
shrivastavadisha / repo_level_prompt_generation
☆124Updated 2 years ago
THUDM / ChatGLM-Math
☆83Updated last year
SparksofAGI / MHPP
☆31Updated 3 weeks ago
seketeam / EvoCodeBench
An Evolving Code Generation Benchmark Aligned with Real-world Code Repositories
☆61Updated 11 months ago
qishenghu / InstructCoder
InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw
☆61Updated 9 months ago
microsoft / SWE-bench-Live
🚀 SWE-bench Goes Live!
☆94Updated last week
multi-swe-bench / multi-swe-bench-env
☆1Updated 10 months ago
SuperGPQA / SuperGPQA
☆155Updated 2 months ago
ntunlp / xCodeEval
xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval
☆84Updated 10 months ago
QwenLM / CodeElo
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings
☆45Updated 5 months ago
DeepSoftwareAnalytics / RLCoder
Reinforcement Learning for Repository-Level Code Completion
☆35Updated 11 months ago
bytedance / FullStackBench
Official repository for our paper "FullStack Bench: Evaluating LLMs as Full Stack Coders"
☆94Updated 2 months ago
ntunlp / ExecEval
A distributed, extensible, secure solution for evaluating machine generated code with unit tests in multiple programming languages.
☆55Updated 8 months ago
amazon-science / Repoformer
Repoformer: Selective Retrieval for Repository-Level Code Completion (ICML 2024)
☆55Updated last month
APEXLAB / CodeApex
☆49Updated last year