domaineval / DomainEvalLinks
DOMAINEVAL is an auto-constructed benchmark for multi-domain code generation that consists of 2k+ subjects (i.e., description, reference code and tests) covering six domains (i.e., Computation, Basic, Network, Cryptography, Visualization, System).
☆14Updated last year
Alternatives and similar repositories for DomainEval
Users that are interested in DomainEval are comparing it to the libraries listed below
Sorting:
- ☆15Updated last year
- Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]☆76Updated last year
- An open-source library for contamination detection in NLP datasets and Large Language Models (LLMs).☆57Updated last year
- [EMNLP 2024] The official GitHub repo for the paper "Course-Correction: Safety Alignment Using Synthetic Preferences"☆19Updated last year
- [NeurIPS 2023 D&B Track] Code and data for paper "Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evalua…☆36Updated 2 years ago
- [ICML2024]Adaptive decoding balances the diversity and coherence of open-ended text generation.☆19Updated last year
- The Paper List on Data Contamination for Large Language Models Evaluation.☆107Updated 3 weeks ago
- A curated list of resources on Reinforcement Learning with Verifiable Rewards (RLVR) and the reasoning capability boundary of Large Langu…☆83Updated last week
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆64Updated last year
- Model merging is a highly efficient approach for long-to-short reasoning.☆92Updated last month
- ☆138Updated 2 months ago
- Lightweight tool to identify Data Contamination in LLMs evaluation☆52Updated last year
- The implement of ACL2024: "MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization"☆42Updated last year
- Code for ICML 25 paper "Metadata Conditioning Accelerates Language Model Pre-training (MeCo)"☆48Updated 5 months ago
- This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"☆52Updated last year
- Official PyTorch Implementation of EMoE: Unlocking Emergent Modularity in Large Language Models [main conference @ NAACL2024]☆37Updated last year
- The source code for running LLMs on the AAAR-1.0 benchmark.☆17Updated 8 months ago
- The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis" [ACL25]☆95Updated 8 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆118Updated 7 months ago
- Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".☆80Updated last year
- Code and data for "MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models"☆49Updated 3 weeks ago
- The repository of the project "Fine-tuning Large Language Models with Sequential Instructions", code base comes from open-instruct and LA…☆30Updated last year
- The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":☆40Updated last year
- SWE-Swiss: A Multi-Task Fine-Tuning and RL Recipe for High-Performance Issue Resolution☆98Updated 2 months ago
- [NeurIPS 2025] Implementation for the paper "The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning"☆136Updated last month
- A unified suite for generating elite reasoning problems and training high-performance LLMs, including pioneering attention-free architect…☆130Updated last month
- ☆46Updated 6 months ago
- Code for the 2025 ACL publication "Fine-Tuning on Diverse Reasoning Chains Drives Within-Inference CoT Refinement in LLMs"☆33Updated 5 months ago
- The rule-based evaluation subset and code implementation of Omni-MATH☆25Updated 11 months ago
- ☆58Updated last year