facebookresearch / cruxeval
CRUXEval: Code Reasoning, Understanding, and Execution Evaluation
☆99Updated last month
Related projects: ⓘ
- RepoQA: Evaluating Long-Context Code Understanding☆96Updated this week
- InstructCoder (former name:Codelnstruct) enables LLMs to edit code☆47Updated 6 months ago
- Code for paper "LEVER: Learning to Verifiy Language-to-Code Generation with Execution" (ICML'23)☆76Updated last year
- CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion (NeurIPS 2023)☆114Updated last month
- ✨ RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems - ICLR 2024☆129Updated last month
- xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval☆70Updated 8 months ago
- ☆73Updated last year
- Astraios: Parameter-Efficient Instruction Tuning Code Language Models☆57Updated 5 months ago
- ☆131Updated last month
- Code for the TMLR 2023 paper "PPOCoder: Execution-based Code Generation using Deep Reinforcement Learning"☆94Updated 8 months ago
- ☆39Updated 3 months ago
- Accepted by Transactions on Machine Learning Research (TMLR)☆115Updated 8 months ago
- Enhancing AI Software Engineering with Repository-level Code Graph☆60Updated 3 weeks ago
- CodeUltraFeedback: aligning large language models to coding preferences☆62Updated 2 months ago
- Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"☆173Updated 3 weeks ago
- BigCodeBench: Benchmarking Code Generation Towards AGI☆184Updated this week
- ☆48Updated 3 months ago
- ☆111Updated last year
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898☆182Updated 4 months ago
- [EACL 2024] ICE-Score: Instructing Large Language Models to Evaluate Code☆67Updated 3 months ago
- StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback☆48Updated 2 weeks ago
- A multi-programming language benchmark for LLMs☆189Updated this week
- EvoEval: Evolving Coding Benchmarks via LLM☆57Updated 5 months ago
- ☆101Updated 2 months ago
- evol augment any dataset online☆55Updated last year
- ☆86Updated last year
- ☆53Updated 4 months ago
- Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…☆100Updated 3 months ago
- ☆77Updated this week
- Repoformer: Selective Retrieval for Repository-Level Code Completion (ICML 2024)☆33Updated 2 months ago