[LREC-COLING'24] HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization
☆41Mar 7, 2025Updated last year
Alternatives and similar repositories for HumanEval-XL
Users that are interested in HumanEval-XL are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2024] Self-Optimization Improves the Efficiency of Code Generation☆14May 10, 2025Updated 10 months ago
- ☆18Aug 11, 2022Updated 3 years ago
- For our ICSE23 paper "Impact of Code Language Models on Automated Program Repair" by Nan Jiang, Kevin Liu, Thibaud Lutellier, and Lin Tan☆63Oct 16, 2024Updated last year
- Generating Adversarial Examples for Holding Robustness of Source Code Processing Models☆15Dec 2, 2021Updated 4 years ago
- This repo is the artifact of FUEL☆13Dec 2, 2025Updated 3 months ago
- Official implementation for "MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models"☆18Oct 26, 2024Updated last year
- DocChecker: Bootstrapping Code-Text Pretrained Language Model to Detect Inconsistency Between Code and Comment☆15Jan 23, 2024Updated 2 years ago
- [COLING25] CodeJudge Eval: Can Large Language Models be Good Judges in Code Understanding?☆12Dec 3, 2024Updated last year
- RACE is a multi-dimensional benchmark for code generation that focuses on Readability, mAintainability, Correctness, and Efficiency.☆12Oct 12, 2024Updated last year
- Replication Package for "Natural Attack for Pre-trained Models of Code", ICSE 2022☆51Nov 7, 2025Updated 4 months ago
- Evaluate state-of-the-art sparse embedding models on the LIMIT dataset (`limit-small` and `limit`) from google's paper `On the Theoretica…☆15Sep 4, 2025Updated 6 months ago
- ☆14Mar 3, 2022Updated 4 years ago
- Contests based Dataset for Code Generation☆13Dec 11, 2022Updated 3 years ago
- Dataflow-guided retrieval augmentation for repository-level code completion, ACL 2024 (main)