[ICML 2023] Data and code release for the paper "DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation".
☆274Oct 30, 2024Updated last year
Alternatives and similar repositories for DS-1000
Users that are interested in DS-1000 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆34Mar 21, 2026Updated 2 months ago
- [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation☆50Dec 22, 2023Updated 2 years ago
- A framework for the evaluation of autoregressive code generation language models.☆1,048Jul 22, 2025Updated 10 months ago
- ☆17Dec 9, 2022Updated 3 years ago
- [ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI☆507Jan 3, 2026Updated 5 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [ICLR 2023] Code for the paper "Binding Language Models in Symbolic Languages"☆326Aug 25, 2023Updated 2 years ago
- Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024☆1,764Oct 2, 2025Updated 8 months ago
- 🐙 OctoPack: Instruction Tuning Code Large Language Models☆478Feb 5, 2025Updated last year
- code for "Natural Language to Code Translation with Execution"☆41Nov 2, 2022Updated 3 years ago
- [ICLR 2023] Code for our paper "Selective Annotation Makes Language Models Better Few-Shot Learners"☆109Jul 15, 2023Updated 2 years ago
- CRUXEval: Code Reasoning, Understanding, and Execution Evaluation☆170Oct 11, 2024Updated last year
- Code for the paper "Evaluating Large Language Models Trained on Code"☆3,259Jan 17, 2025Updated last year
- Code for generating the JuICe dataset.☆37Oct 27, 2021Updated 4 years ago
- APPS: Automated Programming Progress Standard (NeurIPS 2021)☆533Jun 19, 2024Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- CliniQG4QA: Generating Diverse Questions for Domain Adaptation of Clinical Question Answering☆23Feb 26, 2021Updated 5 years ago
- [EMNLP 2022] Unifying and multi-tasking structured knowledge grounding with language models☆566Aug 22, 2023Updated 2 years ago
- A MBTI test on Large Language Model like GPT-3.☆27May 2, 2022Updated 4 years ago
- Contests based Dataset for Code Generation☆13Dec 11, 2022Updated 3 years ago
- ☆10Apr 15, 2023Updated 3 years ago
- Paper collections of methods that using language to interact with environment, including interact with real world, simulated world or WWW…☆128Jul 26, 2023Updated 2 years ago
- Lyra: A Benchmark for Turducken-Style Code Generation☆15Apr 22, 2022Updated 4 years ago
- A multi-programming language benchmark for LLMs☆307Apr 12, 2026Updated 2 months ago
- ☆53Aug 25, 2023Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆14Aug 18, 2022Updated 3 years ago
- [ACL '24] Source code for paper: INTERVENOR : Prompt the Coding Ability of Large Language Models with the Interactive Chain of Repairing☆30Nov 25, 2024Updated last year
- [ACL 2024] Code for the paper "ALaRM: Align Language Models via Hierarchical Rewards Modeling"☆25Mar 28, 2024Updated 2 years ago
- ☆19Aug 9, 2024Updated last year
- Mapping Language to Code in a Programmatic Context☆80Jan 27, 2021Updated 5 years ago
- Official repository of the paper: Marking Code Without Breaking It: Code Watermarking for Detecting LLM-Generated Code (Findings of EACL …☆12Mar 26, 2026Updated 2 months ago
- The CodeInsight dataset is designed for code generation tasks, providing developers with expert-curated examples that bridge the gap betw…☆15Oct 22, 2024Updated last year
- [ICLR 2024] Lemur: Open Foundation Models for Language Agents☆556Oct 28, 2023Updated 2 years ago
- The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis" [ACL25]☆101Apr 9, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Official repository for the paper "COAST: Enhancing the Code Debugging Ability of LLMs through Communicative Agent Based Data Synthesis".☆18Feb 19, 2025Updated last year
- [EACL'23] MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages☆23Feb 13, 2023Updated 3 years ago
- CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion (NeurIPS 2023)☆179Aug 15, 2025Updated 10 months ago
- ☆676Nov 1, 2024Updated last year
- CodeXGLUE☆1,826Apr 23, 2024Updated 2 years ago
- 🤖ConvRe🤯: An Investigation of LLMs’ Inefficacy in Understanding Converse Relations (EMNLP 2023)☆24Oct 10, 2023Updated 2 years ago
- Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"☆884Jul 16, 2025Updated 11 months ago