Industrial-level evaluation benchmarks for Coding LLMs in the full life-cycle of AI native software developing.企业级代码大模型评测体系,持续开放中
☆109Apr 28, 2025Updated last year
Alternatives and similar repositories for codefuse-evaluation
Users that are interested in codefuse-evaluation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- High-performance LLM inference based on our optimized version of FastTransfomer☆122Dec 14, 2023Updated 2 years ago
- High Accuracy and efficiency multi-task fine-tuning framework for Code LLMs. This work has been accepted by KDD 2024.☆716Dec 30, 2024Updated last year
- ☆34Jul 23, 2024Updated last year
- A collection of practical code generation tasks and tests in open source projects. Complementary to HumanEval by OpenAI.☆156Dec 25, 2024Updated last year
- ☆64Jan 16, 2025Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- A curated list of papers and applications on tool learning.☆125Dec 27, 2023Updated 2 years ago
- ☆41Oct 17, 2024Updated last year
- CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion (NeurIPS 2023)☆177Aug 15, 2025Updated 8 months ago
- DOMAINEVAL is an auto-constructed benchmark for multi-domain code generation that consists of 2k+ subjects (i.e., description, reference …☆13Dec 12, 2024Updated last year
- The minimal, ad-hoc way of plug and play NebulaGraph with pip install, even inside Colab Notebook!☆21May 24, 2024Updated last year
- Query-Based Code Analysis Engine☆354Sep 21, 2025Updated 7 months ago
- [ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI☆498Jan 3, 2026Updated 3 months ago
- [TMLR] A curated list of language modeling researches for code (and other software engineering activities), plus related datasets.☆3,312Apr 10, 2026Updated 3 weeks ago
- ☆14May 28, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [ICML 2023] Data and code release for the paper "DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation".☆270Oct 30, 2024Updated last year
- Simple Implementation of TinyGPTV in super simple Zeta lego blocks☆16Nov 11, 2024Updated last year
- A multi-programming language benchmark for LLMs☆302Apr 12, 2026Updated 2 weeks ago
- A curated paper list on LLM reasoning.☆90Mar 4, 2024Updated 2 years ago
- [AAAI 2024] LLMEval Phase II dataset — professional domain evaluation across 12 academic disciplines☆71Apr 15, 2026Updated 2 weeks ago
- Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"☆852Jul 16, 2025Updated 9 months ago
- Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024☆1,727Oct 2, 2025Updated 6 months ago
- AI Native IDE based on CodeFuse and OpenSumi☆286Dec 3, 2025Updated 4 months ago
- Collection of evaluation code for natural language generation.☆12Jan 6, 2021Updated 5 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Code for LLM_Catastrophic_Forgetting via SAM.☆11Jun 7, 2024Updated last year
- ☆47Dec 12, 2024Updated last year
- ☆239Feb 28, 2026Updated 2 months ago
- Code for the paper "Evaluating Large Language Models Trained on Code"☆3,212Jan 17, 2025Updated last year
- ☆13Apr 3, 2026Updated 3 weeks ago
- Coeditor: Leveraging Repo-level Diffs for Code Auto-editing☆31Feb 25, 2024Updated 2 years ago
- Lyra: A Benchmark for Turducken-Style Code Generation☆15Apr 22, 2022Updated 4 years ago
- ICML 2025 Spotlight, PCEvolve: Private Contrastive Evolution for Synthetic Dataset Generation via Few-Shot Private Data and Generative AP…☆14Jun 27, 2025Updated 10 months ago
- A First Look at Conventional Commits Classification☆13Nov 18, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- A collection of practical code generation tasks and tests from open source projects. Complementary to HumanEval by OpenAI.☆24Jan 28, 2023Updated 3 years ago
- The Core Algorithm of SmartCommit.☆27Dec 20, 2021Updated 4 years ago
- Benchmark ClassEval for class-level code generation.☆146Oct 24, 2024Updated last year
- This is the implementation for the paper: Sequential Recommender System based on Hierarchical Attention Network☆11Mar 13, 2021Updated 5 years ago
- ☆492Aug 15, 2024Updated last year
- An intelligent assistant serving the entire software development lifecycle, powered by a Multi-Agent Framework, working with DevOps Toolk…☆1,287Jul 1, 2024Updated last year
- Developer-Intent Driven Code Comment Generation☆20Feb 14, 2023Updated 3 years ago