domaineval / DomainEvalLinks
DOMAINEVAL is an auto-constructed benchmark for multi-domain code generation that consists of 2k+ subjects (i.e., description, reference code and tests) covering six domains (i.e., Computation, Basic, Network, Cryptography, Visualization, System).
☆13Updated 9 months ago
Alternatives and similar repositories for DomainEval
Users that are interested in DomainEval are comparing it to the libraries listed below
Sorting:
- ☆15Updated 10 months ago
- RACE is a multi-dimensional benchmark for code generation that focuses on Readability, mAintainability, Correctness, and Efficiency.☆10Updated 11 months ago
- Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]☆74Updated 10 months ago
- [ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning☆27Updated last year
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆62Updated 10 months ago
- Official code implementation for the ACL 2025 paper: 'CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis'☆30Updated 4 months ago
- [NeurIPS 2023 D&B Track] Code and data for paper "Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evalua…☆35Updated 2 years ago
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy☆69Updated 9 months ago
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆63Updated last year
- [NAACL 2024 Outstanding Paper] Source code for the NAACL 2024 paper entitled "R-Tuning: Instructing Large Language Models to Say 'I Don't…☆121Updated last year
- An open-source library for contamination detection in NLP datasets and Large Language Models (LLMs).☆56Updated last year
- The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"☆38Updated last year
- Official implementation of Privacy Implications of Retrieval-Based Language Models (EMNLP 2023). https://arxiv.org/abs/2305.14888☆36Updated last year
- The repository of the project "Fine-tuning Large Language Models with Sequential Instructions", code base comes from open-instruct and LA…☆29Updated 10 months ago
- The rule-based evaluation subset and code implementation of Omni-MATH☆23Updated 9 months ago
- Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model☆70Updated 2 years ago
- Methods and evaluation for aligning language models temporally☆30Updated last year
- The implement of ACL2024: "MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization"☆42Updated last year
- [NeurIPS 2024] Code and Data Repo for Paper "Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning"☆27Updated last year
- [2025-TMLR] A Survey on the Honesty of Large Language Models☆59Updated 10 months ago
- [NeurIPS 2025 D&B Track] Evaluation Code Repo for Paper "PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts"☆30Updated 4 months ago
- ☆67Updated 3 months ago
- Lightweight tool to identify Data Contamination in LLMs evaluation☆52Updated last year
- Official Code Repository for [AutoScale📈: Scale-Aware Data Mixing for Pre-Training LLMs] Published as a conference paper at **COLM 2025*…