NaturalCodeBench (Findings of ACL 2024)
☆70Oct 14, 2024Updated last year
Alternatives and similar repositories for NaturalCodeBench
Users that are interested in NaturalCodeBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆83Apr 18, 2024Updated 2 years ago
- CRUXEval: Code Reasoning, Understanding, and Execution Evaluation☆169Oct 11, 2024Updated last year
- Official repository for the paper "COAST: Enhancing the Code Debugging Ability of LLMs through Communicative Agent Based Data Synthesis".☆18Feb 19, 2025Updated last year
- ☆17Feb 28, 2024Updated 2 years ago
- ☆57May 28, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆318Aug 18, 2025Updated 9 months ago
- Extensive Self-Contrast Enables Feedback-Free Language Model Alignment☆20Apr 2, 2024Updated 2 years ago
- ☆13Mar 5, 2025Updated last year
- ☆10Nov 14, 2024Updated last year
- ☆22Jul 16, 2024Updated last year
- [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation☆50Dec 22, 2023Updated 2 years ago
- ☆47Jun 11, 2025Updated 11 months ago
- Reproducing R1 for Code with Reliable Rewards☆12Apr 9, 2025Updated last year
- [ACL 2025] Graph Aligned Large Language Models for Improved Source Code Understanding☆45May 18, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion (NeurIPS 2023)☆179Aug 15, 2025Updated 9 months ago
- An Evolving Code Generation Benchmark Aligned with Real-world Code Repositories☆71Aug 15, 2024Updated last year
- Arxiv地址:https://arxiv.org/abs/2409.01944☆23Feb 20, 2025Updated last year
- ☆25Jul 20, 2025Updated 10 months ago
- A modified Alphazero implementation with C++ where performance matters.☆19Updated this week
- A collection of papers tackling automatic fact-checking (particularly of AI-generated content)☆13Nov 3, 2023Updated 2 years ago
- xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval☆89Sep 17, 2024Updated last year
- Repository for "SecurityEval Dataset: Mining Vulnerability Examples to Evaluate Machine Learning-Based Code Generation Techniques" publis…☆92Nov 4, 2023Updated 2 years ago
- ☆47Dec 12, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- The official implementation of "Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks"☆56May 22, 2025Updated last year
- ☆21Jul 24, 2025Updated 10 months ago
- [ICML 2023] Data and code release for the paper "DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation".☆272Oct 30, 2024Updated last year
- ☆159Aug 27, 2024Updated last year
- ☆50Sep 6, 2023Updated 2 years ago
- Collection of papers for scalable automated alignment.☆93Oct 22, 2024Updated last year
- ☆10May 25, 2017Updated 9 years ago
- Reproducing R1 for Code with Reliable Rewards☆310May 5, 2025Updated last year
- Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework☆296Jan 17, 2026Updated 4 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"☆878Jul 16, 2025Updated 10 months ago
- StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback☆73Aug 31, 2024Updated last year
- Official github repo for AutoDetect, an automated weakness detection framework for LLMs.☆46Jun 25, 2024Updated last year
- Recursive Abstractive Processing for Tree-Organized Retrieval☆10May 30, 2024Updated 2 years ago
- [ACL 2025 Main] SceneGenAgent: Precise Industrial Scene Generation with Coding Agent☆37Nov 29, 2024Updated last year
- ☆139May 8, 2025Updated last year
- [ICLR 2024] This is the official implementation for the paper: "Beyond imitation: Leveraging fine-grained quality signals for alignment"☆10May 5, 2024Updated 2 years ago