ntunlp/xCodeEval

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ntunlp/xCodeEval)

ntunlp / xCodeEval

xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval

☆90

Alternatives and similar repositories for xCodeEval

Users that are interested in xCodeEval are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ntunlp / ExecEval
View on GitHub
A distributed, extensible, secure solution for evaluating machine generated code with unit tests in multiple programming languages.
☆64Oct 21, 2024Updated last year
CarperAI / decontamination
View on GitHub
This repository contains code for cleaning your training data of benchmark data to help combat data snooping.
☆28Apr 21, 2023Updated 3 years ago
Naman-ntc / FastCode
View on GitHub
Utilities for efficient fine-tuning, inference and evaluation of code generation models
☆20Oct 3, 2023Updated 2 years ago
rizwan09 / REDCODER
View on GitHub
☆44Jun 24, 2025Updated last year
multimodal-art-projection / CodeCriticBench
View on GitHub
☆16Nov 1, 2025Updated 8 months ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
jstzwj / ChatTester
View on GitHub
No More Manual Tests? Evaluating and Improving ChatGPT for Unit Test Generation
☆19Jun 28, 2023Updated 3 years ago
zorazrw / odex
View on GitHub
[EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation
☆49Dec 22, 2023Updated 2 years ago
YihongDong / CDD-TED4LLMs
View on GitHub
☆16Nov 26, 2024Updated last year
ICSE-DOME / DOME
View on GitHub
Developer-Intent Driven Code Comment Generation
☆21Feb 14, 2023Updated 3 years ago
NizarIslah / GitChameleon
View on GitHub
☆14Jul 18, 2025Updated last year
facebookresearch / lss_eval
View on GitHub
This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…
☆31Aug 25, 2023Updated 2 years ago
vis-nlp / OpenCQA
View on GitHub
☆13Jun 20, 2023Updated 3 years ago
RenzeLou / AAAR-1.0
View on GitHub
The source code for running LLMs on the AAAR-1.0 benchmark.
☆20Apr 5, 2025Updated last year
facebookresearch / cruxeval
View on GitHub
CRUXEval: Code Reasoning, Understanding, and Execution Evaluation
☆171Oct 11, 2024Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
schauppi / Self-Rewarding-Language-Models
View on GitHub
☆50May 13, 2024Updated 2 years ago
searchableai / ChainCQG
View on GitHub
☆13Feb 11, 2021Updated 5 years ago
CoderEval / CoderEval
View on GitHub
A collection of practical code generation tasks and tests in open source projects. Complementary to HumanEval by OpenAI.
☆158Dec 25, 2024Updated last year
amazon-science / cceval
View on GitHub
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion (NeurIPS 2023)
☆181Aug 15, 2025Updated 11 months ago
ntunlp / LLMSanitize
View on GitHub
An open-source library for contamination detection in NLP datasets and Large Language Models (LLMs).
☆61Aug 13, 2024Updated last year
google-deepmind / icml2024-roundtrip-correctness
View on GitHub
☆17Jun 18, 2024Updated 2 years ago
nuprl / MultiPL-T
View on GitHub
Knowledge transfer from high-resource to low-resource programming languages for Code LLMs
☆17Aug 12, 2025Updated 11 months ago
THUDM / NaturalCodeBench
View on GitHub
NaturalCodeBench (Findings of ACL 2024)
☆70Oct 14, 2024Updated last year
Eric-Wallace / trickme-interface
View on GitHub
Code for the 2019 TACL Paper "Trick Me If You Can: Human-in-the-loop Generation of Adversarial Question Answering Examples"
☆36Jul 3, 2019Updated 7 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
ctlllll / understanding_llm_benchmarks
View on GitHub
Understanding the correlation between different LLM benchmarks
☆30Jan 11, 2024Updated 2 years ago
ZJU-ACES-ISE / ChatUniTest_IDEA_Plugin
View on GitHub
☆11Feb 20, 2024Updated 2 years ago
reddy-lab-code-research / PPOCoder
View on GitHub
Code for the TMLR 2023 paper "PPOCoder: Execution-based Code Generation using Deep Reinforcement Learning"
☆116Jan 9, 2024Updated 2 years ago
Adlik / model_zoo
View on GitHub
☆11Dec 26, 2025Updated 6 months ago
evalplus / evalplus
View on GitHub
Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024
☆1,782Oct 2, 2025Updated 9 months ago
thunlp / DebugBench
View on GitHub
The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".
☆87Jul 13, 2024Updated 2 years ago
amazon-science / llm-code-preference
View on GitHub
Training and Benchmarking LLMs for Code Preference.
☆38Nov 15, 2024Updated last year
SWE-Gym / SWE-Bench-Fork
View on GitHub
☆13Mar 5, 2025Updated last year
SEKE-Adversary / MHM
View on GitHub
Generating Adversarial Examples for Holding Robustness of Source Code Processing Models
☆17Dec 2, 2021Updated 4 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
allenai / allennlp-server
View on GitHub
A simple demo server for AllenNLP models.
☆28Jun 26, 2023Updated 3 years ago
bigscience-workshop / multilingual-modeling
View on GitHub
BLOOM+1: Adapting BLOOM model to support a new unseen language
☆74Mar 2, 2024Updated 2 years ago
my-other-github-account / llm-humaneval-benchmarks
View on GitHub
☆86May 15, 2026Updated 2 months ago
dxlong2000 / SG-CQG
View on GitHub
[ACL 2023] Modeling What-to-ask and How-to-ask for Answer-unaware Conversational Question Generation
☆14Jul 11, 2023Updated 3 years ago
vinci-grape / ThinkRepair
View on GitHub
This is the repository for the paper titled "ThinkRepair: Self-Directed Automated Program Repair" accepted by ISSTA'24.
☆31Jan 10, 2026Updated 6 months ago
codetlingua / codetlingua
View on GitHub
☆18Apr 15, 2024Updated 2 years ago
QiushiSun / Awesome-Code-Intelligence
View on GitHub
Neural Code Intelligence Survey 2024-25; Reading lists and resources
☆281Jul 24, 2025Updated 11 months ago