A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories
☆38Sep 4, 2024Updated last year
Alternatives and similar repositories for DevEval
Users that are interested in DevEval are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆27Jul 20, 2024Updated last year
- An Evolving Code Generation Benchmark Aligned with Real-world Code Repositories☆69Aug 15, 2024Updated last year
- ☆61Jun 19, 2024Updated last year
- ☆14Jul 22, 2021Updated 4 years ago
- ☆14May 28, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- [NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI☆106Mar 6, 2025Updated last year
- A Comprehensive Benchmark for Software Development.☆131May 30, 2024Updated last year
- ☆32Jan 14, 2025Updated last year
- The Infibench variant of bigcode-evaluation-harness --- a framework for the evaluation of autoregressive code generation language models.☆14Oct 19, 2024Updated last year
- A First Look at Conventional Commits Classification☆13Nov 18, 2024Updated last year
- ☆14Dec 12, 2023Updated 2 years ago
- A Code Efficiency Benchmark for Code Generation☆13May 26, 2025Updated 10 months ago
- Official repository of the paper: Marking Code Without Breaking It: Code Watermarking for Detecting LLM-Generated Code (Findings of EACL …☆12Mar 26, 2026Updated 2 weeks ago
- ☆28Nov 10, 2025Updated 5 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆11Jul 14, 2024Updated last year
- Baselines for all tasks from Long Code Arena benchmarks 🏟️☆39Mar 30, 2025Updated last year
- Code for EMNLP 2021 Paper "Recall and Learn: A Memory-augmented Solver for Math Word Problems".☆16Oct 20, 2022Updated 3 years ago
- Code and data for AAAI 2022 paper "Multilingual Code Snippets Training for Program Translation"☆10Mar 7, 2022Updated 4 years ago
- ☆18Mar 18, 2024Updated 2 years ago
- 1990–2021년 한국어 신문 사회면 기사의 ○○女·○○男 집계☆17Sep 26, 2023Updated 2 years ago
- TOD-Flow: Modeling the Structure of Task-Oriented Dialogues☆13Feb 7, 2024Updated 2 years ago
- ☆24Nov 19, 2024Updated last year
- A collection of some awesome public projects about LLM-based Web Agents and Tools.☆12Apr 25, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- ☆15Jul 20, 2025Updated 8 months ago
- ☆11Oct 16, 2023Updated 2 years ago
- Benchmark ClassEval for class-level code generation.☆148Oct 24, 2024Updated last year
- [COLING 2025] NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models☆18Jan 18, 2025Updated last year
- A tracer to generate sequence diagrams from running Python programs.☆16Feb 5, 2019Updated 7 years ago
- Is Neuron Coverage a Meaningful Measure for Testing Deep Neural Networks? (FSE 2020)☆10Sep 23, 2021Updated 4 years ago
- [ICML 2023] Data and code release for the paper "DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation".☆269Oct 30, 2024Updated last year
- 언어와 컴퓨터 (2021학년도 2학기, 서울대학교 언어학과)☆13Aug 16, 2022Updated 3 years ago
- 🦮 Code and pretrained models for Findings of ACL 2022 paper "LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrie…☆49Apr 25, 2022Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- [EMNLP 2024] FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents☆22Jan 6, 2025Updated last year
- ☆14Feb 18, 2025Updated last year
- [EMNLP 2024] RoTBench: A Multi-Level Benchmark for Evaluating the Robustness of Large Language Models in Tool Learning☆15May 13, 2025Updated 11 months ago
- ☆10Jul 11, 2022Updated 3 years ago
- [ISSTA'24] A Large-Scale Dataset Capable of Enhancing the Prowess of Large Language Models for Program Testing☆12Jan 7, 2025Updated last year
- ☆10May 28, 2023Updated 2 years ago
- CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file Context☆19Feb 20, 2026Updated last month