NaturalCodeBench (Findings of ACL 2024)
☆70Oct 14, 2024Updated last year
Alternatives and similar repositories for NaturalCodeBench
Users that are interested in NaturalCodeBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆83Apr 18, 2024Updated 2 years ago
- Official repository for the paper "COAST: Enhancing the Code Debugging Ability of LLMs through Communicative Agent Based Data Synthesis".☆17Feb 19, 2025Updated last year
- ☆17Feb 28, 2024Updated 2 years ago
- ☆57May 28, 2024Updated last year
- ☆316Aug 18, 2025Updated 9 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- ☆13Mar 5, 2025Updated last year
- ☆10Nov 14, 2024Updated last year
- ☆22Jul 16, 2024Updated last year
- [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation☆49Dec 22, 2023Updated 2 years ago
- ☆46Jun 11, 2025Updated 11 months ago
- A collection of practical code generation tasks and tests from open source projects. Complementary to HumanEval by OpenAI.☆24Jan 28, 2023Updated 3 years ago
- Reproducing R1 for Code with Reliable Rewards☆12Apr 9, 2025Updated last year
- [ACL 2025] Graph Aligned Large Language Models for Improved Source Code Understanding☆44May 18, 2025Updated last year
- CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion (NeurIPS 2023)☆177Aug 15, 2025Updated 9 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Repository containing the website for the EMNLP 2023 conference☆17Feb 12, 2025Updated last year
- ☆12Mar 18, 2024Updated 2 years ago
- ☆25Jul 20, 2025Updated 10 months ago
- A modified Alphazero implementation with C++ where performance matters.☆19Mar 7, 2026Updated 2 months ago
- ☆16Nov 26, 2024Updated last year
- A collection of papers tackling automatic fact-checking (particularly of AI-generated content)☆13Nov 3, 2023Updated 2 years ago
- xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval☆88Sep 17, 2024Updated last year
- Repository for "SecurityEval Dataset: Mining Vulnerability Examples to Evaluate Machine Learning-Based Code Generation Techniques" publis…☆90Nov 4, 2023Updated 2 years ago
- Building Open LLM Web Agents with Self-Evolving Online Curriculum RL☆524Jun 6, 2025Updated 11 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- The official implementation of "Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks"☆56May 22, 2025Updated 11 months ago
- ☆21Jul 24, 2025Updated 9 months ago
- [ICML 2023] Data and code release for the paper "DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation".☆273Oct 30, 2024Updated last year
- ☆159Aug 27, 2024Updated last year
- ✨ RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems - ICLR 2024☆204Aug 16, 2024Updated last year
- Collection of papers for scalable automated alignment.☆93Oct 22, 2024Updated last year
- ☆10Oct 28, 2019Updated 6 years ago
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"☆11Oct 11, 2024Updated last year
- [ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI☆500Jan 3, 2026Updated 4 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Reproducing R1 for Code with Reliable Rewards☆308May 5, 2025Updated last year
- Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework☆285Jan 17, 2026Updated 4 months ago
- Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"☆864Jul 16, 2025Updated 10 months ago
- Official github repo for AutoDetect, an automated weakness detection framework for LLMs.☆46Jun 25, 2024Updated last year
- Recursive Abstractive Processing for Tree-Organized Retrieval☆10May 30, 2024Updated last year
- JoanAudit - A security slicing tool that helps security auditors to perform their security auditing tasks more efficiently☆10Sep 6, 2017Updated 8 years ago
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆87Aug 10, 2024Updated last year