Industrial-level evaluation benchmarks for Coding LLMs in the full life-cycle of AI native software developing.企业级代码大模型评测体系,持续开放中
☆106Apr 28, 2025Updated 10 months ago
Alternatives and similar repositories for codefuse-evaluation
Users that are interested in codefuse-evaluation are comparing it to the libraries listed below
Sorting:
- Index of the CodeFuse Repositories☆135Sep 2, 2024Updated last year
- High-performance LLM inference based on our optimized version of FastTransfomer☆122Dec 14, 2023Updated 2 years ago
- High Accuracy and efficiency multi-task fine-tuning framework for Code LLMs. This work has been accepted by KDD 2024.☆710Dec 30, 2024Updated last year
- ☆34Jul 23, 2024Updated last year
- A collection of practical code generation tasks and tests in open source projects. Complementary to HumanEval by OpenAI.☆154Dec 25, 2024Updated last year
- Industrial-first evaluation benchmark for LLMs in the DevOps/AIOps domain.☆651Jul 10, 2024Updated last year
- ☆62Jun 17, 2024Updated last year
- CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion (NeurIPS 2023)☆175Aug 15, 2025Updated 7 months ago
- A curated list of papers and applications on tool learning.☆125Dec 27, 2023Updated 2 years ago
- ☆40Oct 17, 2024Updated last year
- DOMAINEVAL is an auto-constructed benchmark for multi-domain code generation that consists of 2k+ subjects (i.e., description, reference …☆14Dec 12, 2024Updated last year
- Query-Based Code Analysis Engine☆348Sep 21, 2025Updated 5 months ago
- 汽车行业中文大模型测评基准,基于多轮开放式问题的细粒度评测☆38Dec 26, 2023Updated 2 years ago
- [ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI☆488Jan 3, 2026Updated 2 months ago
- [TMLR] A curated list of language modeling researches for code (and other software engineering activities), plus related datasets.☆3,258Mar 5, 2026Updated 2 weeks ago
- [ICML 2023] Data and code release for the paper "DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation".☆267Oct 30, 2024Updated last year
- Simple Implementation of TinyGPTV in super simple Zeta lego blocks☆16Nov 11, 2024Updated last year
- An Evolving Code Generation Benchmark Aligned with Real-world Code Repositories☆68Aug 15, 2024Updated last year
- 中文大语言模型评测第一期☆113Oct 23, 2023Updated 2 years ago
- A curated paper list on LLM reasoning.☆90Mar 4, 2024Updated 2 years ago
- 中文大语言模型评测第二期☆71Oct 23, 2023Updated 2 years ago
- Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"☆817Jul 16, 2025Updated 8 months ago
- Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024☆1,698Oct 2, 2025Updated 5 months ago
- AI Native IDE based on CodeFuse and OpenSumi☆283Dec 3, 2025Updated 3 months ago
- An official implementation of Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards☆37Oct 3, 2025Updated 5 months ago
- ☆234Feb 28, 2026Updated 3 weeks ago
- Code for the paper "Evaluating Large Language Models Trained on Code"☆3,163Jan 17, 2025Updated last year
- Coeditor: Leveraging Repo-level Diffs for Code Auto-editing☆32Feb 25, 2024Updated 2 years ago
- A collection of practical code generation tasks and tests from open source projects. Complementary to HumanEval by OpenAI.☆24Jan 28, 2023Updated 3 years ago
- ☆11Jul 14, 2024Updated last year
- 🎉LLaMA Demo 7B🎉☆17Mar 23, 2023Updated 2 years ago
- ☆493Aug 15, 2024Updated last year
- This project provides several implementations for commit untangling and proposes a new representation of git patches by projecting the pa…☆12Jul 28, 2025Updated 7 months ago
- Language Models for Code Completion: a Practical Evaluation☆13Jan 19, 2024Updated 2 years ago
- Developer-Intent Driven Code Comment Generation☆20Feb 14, 2023Updated 3 years ago
- Code for the paper "A Structural Model for Contextual Code Changes"☆32Oct 25, 2023Updated 2 years ago
- Ongoing project: a library for graph foundation model☆13Feb 7, 2024Updated 2 years ago
- [NeurIPS 2024] EffiBench: Benchmarking the Efficiency of Automatically Generated Code☆60Nov 30, 2024Updated last year
- A framework for the evaluation of autoregressive code generation language models.☆1,021Jul 22, 2025Updated 7 months ago