facebookresearch / cruxevalLinks

CRUXEval: Code Reasoning, Understanding, and Execution Evaluation

☆151

Alternatives and similar repositories for cruxeval

Users that are interested in cruxeval are comparing it to the libraries listed below

Sorting:

evalplus / repoqa
RepoQA: Evaluating Long-Context Code Understanding
☆113Updated 9 months ago
ntunlp / xCodeEval
xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval
☆86Updated 10 months ago
amazon-science / cceval
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion (NeurIPS 2023)
☆151Updated last year
Leolty / repobench
✨ RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems - ICLR 2024
☆168Updated 11 months ago
reddy-lab-code-research / PPOCoder
Code for the TMLR 2023 paper "PPOCoder: Execution-based Code Generation using Deep Reinforcement Learning"
☆114Updated last year
qishenghu / InstructCoder
InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw
☆62Updated 9 months ago
R2E-Gym / R2E-Gym
Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents
☆136Updated 2 weeks ago
ntunlp / ExecEval
A distributed, extensible, secure solution for evaluating machine generated code with unit tests in multiple programming languages.
☆56Updated 9 months ago
SparksofAGI / MHPP
☆32Updated last month
Ablustrund / APPS_Plus
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
☆67Updated 11 months ago
crux-eval / eval-arena
☆28Updated 2 weeks ago
r2e-project / r2e
r2e: turn any github repository into a programming agent environment
☆128Updated 3 months ago
Zyq-scut / RLTF
Accepted by Transactions on Machine Learning Research (TMLR)
☆130Updated 9 months ago
ise-uiuc / xft
XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts
☆33Updated last year
amazon-science / llm-code-preference
Training and Benchmarking LLMs for Code Preference.
☆34Updated 8 months ago
nuprl / MultiPL-E
A multi-programming language benchmark for LLMs
☆265Updated 2 weeks ago
evo-eval / evoeval
EvoEval: Evolving Coding Benchmarks via LLM
☆75Updated last year
thunlp / DebugBench
The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".
☆79Updated last year
bigcode-project / the-stack-v2
Code for the curation of The Stack v2 and StarCoder2 training data
☆114Updated last year
princeton-nlp / intercode
[NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898
☆223Updated last year
xlang-ai / DS-1000
[ICML 2023] Data and code release for the paper "DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation".
☆251Updated 9 months ago
CodeEditorBench / CodeEditorBench
☆49Updated last year
gonglinyuan / safim
☆36Updated 2 months ago
ganler / code-r1
Reproducing R1 for Code with Reliable Rewards
☆240Updated 2 months ago
amazon-science / mxeval
☆110Updated last year
bigcode-project / astraios
Astraios: Parameter-Efficient Instruction Tuning Code Language Models
☆59Updated last year
shunzh / Code-AI-Tree-Search
☆119Updated last year
openai / human-eval-infilling
Code for the paper "Efficient Training of Language Models to Fill in the Middle"
☆183Updated 2 years ago
shrivastavadisha / repo_level_prompt_generation
☆124Updated 2 years ago
open-compass / DevEval
A Comprehensive Benchmark for Software Development.
☆111Updated last year