jamesmurdza/humaneval-results

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/jamesmurdza/humaneval-results)

jamesmurdza / humaneval-results

Evaluation results of code generation LLMs

☆32

Alternatives and similar repositories for humaneval-results

Users that are interested in humaneval-results are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ReliableCoding / REPEAT
View on GitHub
☆10Apr 15, 2023Updated 3 years ago
amazon-science / incremental-parsing
View on GitHub
Incremental Python parser for constrained generation of code by LLMs.
☆18Sep 18, 2024Updated last year
iCSawyer / SecureVibeBench
View on GitHub
[ACL 2026 Main] SecureVibeBench: Benchmarking Secure Vibe Coding of AI Agents via Reconstructing Vulnerability-Introducing Scenarios
☆17Jun 28, 2026Updated last month
tonybaloney / spew
View on GitHub
A tool for generating random, syntactically-correct Python code. Designed for fuzzing and testing of tools that parse Python code.
☆23Sep 22, 2023Updated 2 years ago
bigcode-project / pii-lib
View on GitHub
Code for PII detection and redaction in code datasets
☆16Jan 24, 2023Updated 3 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
sola-st / DyPyBench
View on GitHub
☆18Nov 12, 2025Updated 8 months ago
all-the-noises / eval-arena
View on GitHub
☆34Mar 21, 2026Updated 4 months ago
google-research / runtime-error-prediction
View on GitHub
This is the repository for the paper Static Prediction of Runtime Errors by Learning to Execute Programs with External Resource Descripti…
☆25Nov 18, 2022Updated 3 years ago
shunzh / Code-AI-Tree-Search
View on GitHub
☆118Jul 17, 2024Updated 2 years ago
nuprl / MultiPL-E
View on GitHub
A multi-programming language benchmark for LLMs
☆314Apr 12, 2026Updated 3 months ago
mahimanzum / FixEval
View on GitHub
We introduce FixEval , a dataset for competitive programming bug fixing along with a comprehensive test suite and show the necessity of e…
☆26Aug 31, 2022Updated 3 years ago
openai / human-eval-infilling
View on GitHub
Code for the paper "Efficient Training of Language Models to Fill in the Middle"
☆209Apr 2, 2023Updated 3 years ago
GraphPKU / number_cookbook
View on GitHub
Official repository for the paper Number Cookbook: Number Understanding of Language Models and How to Improve It.
☆21Mar 31, 2025Updated last year
abvijaykumar / python-lora-finetuning
View on GitHub
Finetuning a codegen model with python instruction set using QLORA technique for better efficacy
☆11Aug 31, 2023Updated 2 years ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
YiQi0318 / LLMs_daily_arxiv
View on GitHub
☆15Jul 25, 2024Updated 2 years ago
khtee / text-classification-pytorch
View on GitHub
Pytorch implementation of RNN, CNN, BiGRU and LSTM for text classifcation
☆10Apr 30, 2021Updated 5 years ago
abacaj / code-eval
View on GitHub
Run evaluation on LLMs using human-eval benchmark
☆431Sep 12, 2023Updated 2 years ago
KoyenaPal / future-lens
View on GitHub
Code and Data Repo for the CoNLL Paper -- Future Lens: Anticipating Subsequent Tokens from a Single Hidden State
☆21Oct 24, 2025Updated 9 months ago
salmansherin / QExplore
View on GitHub
QExplore is a dynamic automatic exploration tool for dynamic web applications. It reverse engineers a state-flow model that can be used t…
☆13Mar 6, 2025Updated last year
amazon-science / cocomic
View on GitHub
CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file Context
☆19Feb 20, 2026Updated 5 months ago
aixcoder-plugin / nl2code-dataset
View on GitHub
Aix-bench, the Java benchmark for code synthesis problem.
☆52Aug 19, 2022Updated 3 years ago
microsoft / repoclassbench
View on GitHub
[ICML DMLR 2024] Repo that contains code for the paper titled: "Class-Level Code Generation from Natural Language Using Iterative, Tool-E…
☆18Updated this week
NL2Code / NL2Code.github.io
View on GitHub
Large Language Models Meet NL2Code: A Survey
☆35Nov 19, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
lunary-ai / llm-benchmarks
View on GitHub
LLM benchmarks
☆13Feb 22, 2024Updated 2 years ago
jack57lee / neuralCodeCompletion
View on GitHub
The implementation of the IJCAI 2018 paper: Code Completion with Neural Attention and Pointer Networks
☆18Sep 11, 2019Updated 6 years ago
BASE-LAB-SJTU / CosBench
View on GitHub
A dataset for natural language code search.
☆14Feb 13, 2020Updated 6 years ago
deep-learning-algorithm / LightWeightCNN
View on GitHub
轻量化卷积神经网络实现（SqueezeNet/MobileNet/ShuffleNet/MnasNet）
☆12Mar 5, 2026Updated 4 months ago
shuzhenggao / ICL4code
View on GitHub
☆13Aug 9, 2023Updated 2 years ago
FSoft-AI4Code / DocChecker
View on GitHub
DocChecker: Bootstrapping Code-Text Pretrained Language Model to Detect Inconsistency Between Code and Comment
☆15Jan 23, 2024Updated 2 years ago
mehryaragha / NoseBiometrics
View on GitHub
Feature extraction algorithm using spherical patches from the Gabor-wavelet normal maps
☆10Dec 31, 2018Updated 7 years ago
GammaTauAI / opentau
View on GitHub
Using Large Language Models for Repo-wide Type Prediction
☆113Dec 10, 2023Updated 2 years ago
rajarshihaldar / codetextmatch
View on GitHub
☆19Dec 8, 2022Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
theoxo / self-repair
View on GitHub
[ICLR 2024]: Is Self-Repair a Silver Bullet for Code Generation?
☆15May 2, 2024Updated 2 years ago
YBRua / SrcMarker
View on GitHub
Code for paper "SrcMarker: Dual-Channel Source Code Watermarking via Scalable Code Transformations" (IEEE S&P 2024)
☆34Aug 8, 2024Updated last year
parth-pathak / Fault-prediction-using-NASA-MDP-Dataset
View on GitHub
Identifying the best algorithm for defect prediction in softwares and providing the most accurate results. The Metrics Data Program datas…
☆13Dec 23, 2019Updated 6 years ago
wangzhihan-scut / paper-software-defect-prediction
View on GitHub
☆15Jan 27, 2019Updated 7 years ago
xwzy / GNNPapers-CN-mirror
View on GitHub
arxiv国内镜像快速打开论文 The mirror of thunlp/GNNPapers in cn.arxiv.org/arxiv.las.ac.cn, Must-read papers on graph neural networks (GNN)
☆14Sep 12, 2021Updated 4 years ago
jun-zeng / Tailor
View on GitHub
Learning graph-based code representations for source-level functional similarity detection. ICSE'23
☆63Mar 27, 2023Updated 3 years ago
microsoft / ReACC
View on GitHub
Source codes for paper ”ReACC: A Retrieval-Augmented Code Completion Framework“
☆67Apr 18, 2022Updated 4 years ago