Industrial-level evaluation benchmarks for Coding LLMs in the full life-cycle of AI native software developing.企业级代码大模型评测体系,持续开放中
☆105Apr 28, 2025Updated 10 months ago
Alternatives and similar repositories for codefuse-evaluation
Users that are interested in codefuse-evaluation are comparing it to the libraries listed below
Sorting:
- High Accuracy and efficiency multi-task fine-tuning framework for Code LLMs. This work has been accepted by KDD 2024.☆708Dec 30, 2024Updated last year
- DOMAINEVAL is an auto-constructed benchmark for multi-domain code generation that consists of 2k+ subjects (i.e., description, reference …☆14Dec 12, 2024Updated last year
- CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion (NeurIPS 2023)☆174Aug 15, 2025Updated 6 months ago
- A collection of practical code generation tasks and tests in open source projects. Complementary to HumanEval by OpenAI.☆154Dec 25, 2024Updated last year
- ☆56May 28, 2024Updated last year
- Simple Implementation of TinyGPTV in super simple Zeta lego blocks☆16Nov 11, 2024Updated last year
- High-performance LLM inference based on our optimized version of FastTransfomer☆122Dec 14, 2023Updated 2 years ago
- ☆14May 28, 2024Updated last year
- An Evolving Code Generation Benchmark Aligned with Real-world Code Repositories☆67Aug 15, 2024Updated last year
- 汽车行业中文大模型测评基准,基于多轮开放式问题的细粒度评测☆38Dec 26, 2023Updated 2 years ago
- ☆40Oct 17, 2024Updated last year
- A multi-programming language benchmark for LLMs☆298Jan 28, 2026Updated last month
- ☆18Apr 15, 2024Updated last year
- LLM evaluation.☆16Nov 7, 2023Updated 2 years ago
- [TMLR] A curated list of language modeling researches for code (and other software engineering activities), plus related datasets.☆3,232Feb 1, 2026Updated 3 weeks ago
- [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation☆49Dec 22, 2023Updated 2 years ago
- ☆45Dec 12, 2024Updated last year
- CD4Py: Code De-Duplication for Python☆23Dec 13, 2020Updated 5 years ago
- ☆21Aug 19, 2024Updated last year
- [ICML 2023] Data and code release for the paper "DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation".☆266Oct 30, 2024Updated last year
- Benchmark ClassEval for class-level code generation.☆145Oct 24, 2024Updated last year
- MCP server for executing CMD commands. Can be hooked to claude for additional agentics.☆23Feb 14, 2025Updated last year
- Query-Based Code Analysis Engine☆348Sep 21, 2025Updated 5 months ago
- Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024☆1,688Oct 2, 2025Updated 4 months ago
- ☆23Sep 18, 2023Updated 2 years ago
- The Core Algorithm of SmartCommit.☆27Dec 20, 2021Updated 4 years ago
- Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"☆803Jul 16, 2025Updated 7 months ago
- A simple lightweight Model Context Protocol (MCP) server integration framework☆17Jan 23, 2026Updated last month
- cppminer produces a code2seq compatible datasets from C++ code bases.☆23Apr 5, 2020Updated 5 years ago
- Coeditor: Leveraging Repo-level Diffs for Code Auto-editing☆31Feb 25, 2024Updated 2 years ago
- A collection of practical code generation tasks and tests from open source projects. Complementary to HumanEval by OpenAI.☆24Jan 28, 2023Updated 3 years ago
- [NeurIPS 2024] EffiBench: Benchmarking the Efficiency of Automatically Generated Code☆60Nov 30, 2024Updated last year
- Code for the paper "A Structural Model for Contextual Code Changes"☆32Oct 25, 2023Updated 2 years ago
- Code for the paper "Evaluating Large Language Models Trained on Code"☆3,137Jan 17, 2025Updated last year
- NaturalCodeBench (Findings of ACL 2024)☆68Oct 14, 2024Updated last year
- Structured TRIZ prompt engineering for LLMs in an open, portable XML format – MIT licensed.☆14Nov 11, 2025Updated 3 months ago
- Hoppity☆60Nov 25, 2020Updated 5 years ago
- ☆34Jul 23, 2024Updated last year
- CodeBERTScore: an automatic metric for code generation, based on BERTScore☆207Mar 1, 2024Updated last year