FSoft-AI4Code / CodeMMLU
[ICLR 2025] 🚀 CodeMMLU Evaluator: A framework for evaluating LM models on CodeMMLU MCQs benchmark.
☆22Updated 4 months ago
Alternatives and similar repositories for CodeMMLU:
Users that are interested in CodeMMLU are comparing it to the libraries listed below
- [EMNLP 2023] The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation☆92Updated 8 months ago
- [FORGE 2025] Predicting Program Behavior with Dynamic Dependencies Learning☆24Updated 8 months ago
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆86Updated 2 weeks ago
- InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw☆59Updated 6 months ago
- [NAACL 2025] Benchmark for Repository-Level Code Generation, focus on Executability, Correctness from Test Cases and Usage of Contexts fr…☆25Updated last month
- ☆70Updated 5 months ago
- Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents☆61Updated 2 weeks ago
- [FORGE 2025] Graph-based method for end-to-end code completion with context awareness on repository☆62Updated 7 months ago
- ☆93Updated last month
- A distributed, extensible, secure solution for evaluating machine generated code with unit tests in multiple programming languages.☆52Updated 6 months ago
- CRUXEval: Code Reasoning, Understanding, and Execution Evaluation☆136Updated 6 months ago
- Repoformer: Selective Retrieval for Repository-Level Code Completion (ICML 2024)☆55Updated 9 months ago
- Language Model for Mainframe Modernization☆51Updated 8 months ago
- A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.☆59Updated 3 months ago
- Official code for the paper "CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules"☆45Updated 3 months ago
- Astraios: Parameter-Efficient Instruction Tuning Code Language Models☆57Updated last year
- Training and Benchmarking LLMs for Code Preference.☆33Updated 5 months ago
- Official repo for "HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation Task"☆27Updated 2 weeks ago
- ☆26Updated 3 months ago
- The official repo for "LLoCo: Learning Long Contexts Offline"☆116Updated 10 months ago
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆79Updated 3 weeks ago
- Open-source Self-Instruction Tuning Code LLM☆170Updated last year
- xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval☆79Updated 7 months ago
- [NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs☆84Updated 5 months ago
- ☆109Updated 9 months ago
- [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation☆47Updated last year
- StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback☆64Updated 7 months ago
- ☆41Updated 3 weeks ago
- [LREC-COLING'24] HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization☆39Updated last month
- DocChecker: Bootstrapping Code-Text Pretrained Language Model to Detect Inconsistency Between Code and Comment☆14Updated last year