evo-eval/evoeval

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/evo-eval/evoeval)

evo-eval / evoeval

EvoEval: Evolving Coding Benchmarks via LLM

☆84

Alternatives and similar repositories for evoeval

Users that are interested in evoeval are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ise-uiuc / neuri-artifact
View on GitHub
Artifact for ESEC/FSE'23 paper "NeuRI: Diversifying DNN Generation via Inductive Rule Inference"
☆33Nov 13, 2023Updated 2 years ago
llm4code / 2024
View on GitHub
The First International Workshop on Large Language Model for Code 2024 (Co-Located with ICSE 2024)
☆18Oct 4, 2024Updated last year
zzjas / anypoc
View on GitHub
Generates executable Proof-of-Concept for any bug in any project. AI agents discover and reproduce vulnerabilities — verified, not halluc…
☆27May 5, 2026Updated 2 months ago
ise-uiuc / xft
View on GitHub
XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts
☆36Jul 2, 2024Updated 2 years ago
amazon-science / llm-code-preference
View on GitHub
Training and Benchmarking LLMs for Code Preference.
☆38Nov 15, 2024Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
claudeyj / patch_correctness
View on GitHub
☆13May 19, 2024Updated 2 years ago
ise-uiuc / blazedit
View on GitHub
Making code edting up to 7.7x faster using multi-layer speculation
☆23Feb 20, 2025Updated last year
FudanSELab / ClassEval
View on GitHub
Benchmark ClassEval for class-level code generation.
☆151Oct 24, 2024Updated last year
ise-uiuc / WhiteFox
View on GitHub
WhiteFox: White-Box Compiler Fuzzing Empowered by Large Language Models (OOPSLA 2024)
☆84Aug 5, 2025Updated 11 months ago
ise-uiuc / uniapr
View on GitHub
Fast and Precise On-the-fly Patch Validation for All
☆10Feb 24, 2023Updated 3 years ago
ise-uiuc / Repilot
View on GitHub
Repilot, a patch generation tool introduced in the ESEC/FSE'23 paper "Copiloting the Copilots: Fusing Large Language Models with Completi…
☆139Oct 9, 2023Updated 2 years ago
evalplus / repoqa
View on GitHub
RepoQA: Evaluating Long-Context Code Understanding
☆136Nov 1, 2024Updated last year
shuzhenggao / ICL4code
View on GitHub
☆13Aug 9, 2023Updated 2 years ago
ganler / memcov
View on GitHub
Collect simple coverage information in memory.
☆11Oct 6, 2022Updated 3 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
zkx06111 / ALGO
View on GitHub
☆36May 25, 2023Updated 3 years ago
iSEngLab / LLM4UT_Empirical
View on GitHub
[ISSTA 2025] A Large-scale Empirical Study on Fine-tuning Large Language Models for Unit Testing
☆13Feb 9, 2025Updated last year
all-the-noises / eval-arena
View on GitHub
☆34Mar 21, 2026Updated 3 months ago
codetlingua / codetlingua
View on GitHub
☆18Apr 15, 2024Updated 2 years ago
evalplus / evalplus
View on GitHub
Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024
☆1,782Oct 2, 2025Updated 9 months ago
OpenAgentEval / SWE-ABS
View on GitHub
[ICML 2026] SWE-ABS: Adversarial Benchmark Strengthening Exposes Inflated Success Rates on Test-based Benchmark
☆21May 6, 2026Updated 2 months ago
ise-uiuc / DeepREL
View on GitHub
Fuzzing Deep-Learning Libraries via Automated Relational API Inference (ESEC/FSE 2022)
☆39May 17, 2023Updated 3 years ago
michaelpradel / LExecutor
View on GitHub
A learning-guided approach for executing arbitrary Python code snippets
☆16Mar 4, 2024Updated 2 years ago
HumanEval-V / HumanEval-V-Benchmark
View on GitHub
A Lightweight Visual Reasoning Benchmark for Evaluating Large Multimodal Models through Complex Diagrams in Coding Tasks
☆15Feb 25, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
ise-uiuc / nnsmith
View on GitHub
Automated DNN generation for fuzz testing and more
☆149Jan 14, 2025Updated last year
marcusm117 / IdentityChain
View on GitHub
[ICLR 2024] Beyond Accuracy: Evaluating Self-Consistency of Code Large Language Models with IdentityChain
☆11Nov 24, 2025Updated 7 months ago
ARiSE-Lab / TRACED_ICSE_24
View on GitHub
☆22Mar 21, 2024Updated 2 years ago
ise-uiuc / FreeFuzz
View on GitHub
Free Lunch for Testing: Fuzzing Deep-Learning Libraries from Open Source (ICSE'22)
☆81Nov 2, 2022Updated 3 years ago
SEC-bench / SEC-bench-Pro
View on GitHub
☆40Jul 6, 2026Updated 2 weeks ago
JohnnyPeng18 / Coffe
View on GitHub
A Code Efficiency Benchmark for Code Generation
☆14May 26, 2025Updated last year
gmy2013 / LLM_Comment_Generation
View on GitHub
Source Code for Paper "Large Language Models are Few-Shot Summarizers: Multi-Intent Comment Generation via In-Context Learning"
☆19Jun 9, 2023Updated 3 years ago
ise-uiuc / NablaFuzz
View on GitHub
Fuzzing Automatic Differentiation in Deep-Learning Libraries (ICSE'23)
☆27Mar 2, 2024Updated 2 years ago
DeepSoftwareAnalytics / Telly
View on GitHub
Replication package for ISSTA2023 paper - Towards Efficient Fine-tuning of Pre-trained Code Models: An Experimental Study and Beyond
☆23Apr 9, 2023Updated 3 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
amazon-science / cceval
View on GitHub
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion (NeurIPS 2023)
☆181Aug 15, 2025Updated 11 months ago
SparksofAGI / MHPP
View on GitHub
☆35Sep 14, 2025Updated 10 months ago
facebookresearch / swe-rl
View on GitHub
[NeurIPS'25] Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"
☆712Mar 16, 2025Updated last year
floatai / HumanEval-XL
View on GitHub
[LREC-COLING'24] HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization
☆42Mar 7, 2025Updated last year
BugSwarm / bugswarm
View on GitHub
☆38Apr 8, 2026Updated 3 months ago
ise-uiuc / TitanFuzz
View on GitHub
☆94Sep 10, 2023Updated 2 years ago
pkuzqh / GrammarT5
View on GitHub
☆11May 18, 2024Updated 2 years ago