StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
☆74Aug 31, 2024Updated last year
Alternatives and similar repositories for APPS_Plus
Users that are interested in APPS_Plus are comparing it to the libraries listed below
Sorting:
- [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation☆49Dec 22, 2023Updated 2 years ago
- Accepted by Transactions on Machine Learning Research (TMLR)☆137Oct 5, 2024Updated last year
- CodeUltraFeedback: aligning large language models to coding preferences (TOSEM 2025)☆73Jun 25, 2024Updated last year
- Code for the TMLR 2023 paper "PPOCoder: Execution-based Code Generation using Deep Reinforcement Learning"☆117Jan 9, 2024Updated 2 years ago
- Code for paper "LEVER: Learning to Verifiy Language-to-Code Generation with Execution" (ICML'23)☆90Jul 5, 2023Updated 2 years ago
- The official implementation of Cross-Task Experience Sharing (COPS)☆29Oct 23, 2024Updated last year
- Collection of papers for scalable automated alignment.☆93Oct 22, 2024Updated last year
- This is the official implementation for MA-LoT.☆19Aug 4, 2025Updated 6 months ago
- the datasets of our paper☆11Feb 26, 2024Updated 2 years ago
- [NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI☆107Mar 6, 2025Updated 11 months ago
- Code for our ACL '23 paper titled "Grokking of Hierarchical Structure in Vanilla Transformers"☆24Oct 8, 2023Updated 2 years ago
- ☆25Aug 23, 2024Updated last year
- InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw☆64Oct 4, 2024Updated last year
- Official Implementation of NeurIPS'23 Paper "Cross-Episodic Curriculum for Transformer Agents"☆31Oct 12, 2023Updated 2 years ago
- [ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning☆30Mar 5, 2024Updated last year
- ☆14Dec 1, 2025Updated 3 months ago
- ☆11Jan 3, 2024Updated 2 years ago
- ☆14Mar 5, 2024Updated last year
- benchmarks for evaluating MT models☆11Jun 26, 2024Updated last year
- ☆12Mar 4, 2025Updated 11 months ago
- Pythoness: use natural language to define Python functions.☆20Apr 22, 2025Updated 10 months ago
- The official repository of "SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World".☆27Aug 20, 2025Updated 6 months ago
- a benchmark to evaluate the situated inductive reasoning☆15Jan 7, 2025Updated last year
- ☆10Jul 15, 2024Updated last year
- Official Pytorch codebase for Open-Vocabulary Instance Segmentation without Manual Mask Annotations [CVPR 2023]☆52Oct 26, 2025Updated 4 months ago
- AI for Mathematics Paper List☆17Jan 14, 2025Updated last year
- Repository of the paper 'CodeQueries: A Dataset of Semantic Queries over Code' published in ISEC 2024☆13Apr 21, 2024Updated last year
- [LREC-Coling 2024] PECC: Problem Extraction and Coding Challenges☆14May 30, 2024Updated last year
- RACE is a multi-dimensional benchmark for code generation that focuses on Readability, mAintainability, Correctness, and Efficiency.☆12Oct 12, 2024Updated last year
- ☆12Jul 10, 2023Updated 2 years ago
- LeetCode Training and Evaluation Dataset☆48Apr 22, 2025Updated 10 months ago
- ☆56May 28, 2024Updated last year
- Pseudo-code Instructions dataset☆27Dec 18, 2023Updated 2 years ago
- Data and code for ACL 2023 paper "RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations"☆15Feb 8, 2024Updated 2 years ago
- This is the official code implementation of Bongard-OpenWorld (ICLR 2024).☆14Jan 6, 2025Updated last year
- Debate interface, experiments, etc.☆10Mar 12, 2024Updated last year
- Deep Q-Learning Auto Market Maker☆12Jun 12, 2021Updated 4 years ago
- ☆15Sep 7, 2022Updated 3 years ago
- Medical ML Benchmark☆11May 16, 2023Updated 2 years ago