[ICML 2023] Data and code release for the paper "DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation".
☆269Oct 30, 2024Updated last year
Alternatives and similar repositories for DS-1000
Users that are interested in DS-1000 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆34Mar 21, 2026Updated 3 weeks ago
- [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation☆49Dec 22, 2023Updated 2 years ago
- A framework for the evaluation of autoregressive code generation language models.☆1,035Jul 22, 2025Updated 8 months ago
- ☆17Dec 9, 2022Updated 3 years ago
- [ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI☆495Jan 3, 2026Updated 3 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [ICLR 2023] Code for the paper "Binding Language Models in Symbolic Languages"☆324Aug 25, 2023Updated 2 years ago
- Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024☆1,713Oct 2, 2025Updated 6 months ago
- 🐙 OctoPack: Instruction Tuning Code Large Language Models☆478Feb 5, 2025Updated last year
- [ICLR 2023] Code for our paper "Selective Annotation Makes Language Models Better Few-Shot Learners"☆109Jul 15, 2023Updated 2 years ago
- code for "Natural Language to Code Translation with Execution"☆41Nov 2, 2022Updated 3 years ago
- CRUXEval: Code Reasoning, Understanding, and Execution Evaluation☆169Oct 11, 2024Updated last year
- Code for the paper "Evaluating Large Language Models Trained on Code"☆3,191Jan 17, 2025Updated last year
- Code for generating the JuICe dataset.☆37Oct 27, 2021Updated 4 years ago
- APPS: Automated Programming Progress Standard (NeurIPS 2021)☆524Jun 19, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- CliniQG4QA: Generating Diverse Questions for Domain Adaptation of Clinical Question Answering☆23Feb 26, 2021Updated 5 years ago
- [EMNLP 2022] Unifying and multi-tasking structured knowledge grounding with language models☆568Aug 22, 2023Updated 2 years ago
- A MBTI test on Large Language Model like GPT-3.☆27May 2, 2022Updated 3 years ago
- Contests based Dataset for Code Generation☆13Dec 11, 2022Updated 3 years ago
- ☆10Apr 15, 2023Updated 3 years ago
- Paper collections of methods that using language to interact with environment, including interact with real world, simulated world or WWW…☆129Jul 26, 2023Updated 2 years ago
- Lyra: A Benchmark for Turducken-Style Code Generation☆15Apr 22, 2022Updated 3 years ago
- A multi-programming language benchmark for LLMs☆301Updated this week
- ☆54Aug 25, 2023Updated 2 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- ☆14Aug 18, 2022Updated 3 years ago
- [ACL '24] Source code for paper: INTERVENOR : Prompt the Coding Ability of Large Language Models with the Interactive Chain of Repairing☆30Nov 25, 2024Updated last year
- [ACL 2024] Code for the paper "ALaRM: Align Language Models via Hierarchical Rewards Modeling"☆25Mar 28, 2024Updated 2 years ago
- ☆19Aug 9, 2024Updated last year
- Mapping Language to Code in a Programmatic Context☆80Jan 27, 2021Updated 5 years ago
- Official repository of the paper: Marking Code Without Breaking It: Code Watermarking for Detecting LLM-Generated Code (Findings of EACL …☆12Mar 26, 2026Updated 3 weeks ago
- The CodeInsight dataset is designed for code generation tasks, providing developers with expert-curated examples that bridge the gap betw…☆14Oct 22, 2024Updated last year
- [ICLR 2024] Lemur: Open Foundation Models for Language Agents☆556Oct 28, 2023Updated 2 years ago
- The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis" [ACL25]☆100Apr 9, 2025Updated last year
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- A repository containing the Jupyter notebook code generation benchmark.☆59Feb 9, 2022Updated 4 years ago
- CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion (NeurIPS 2023)☆177Aug 15, 2025Updated 8 months ago
- Official repository for the paper "COAST: Enhancing the Code Debugging Ability of LLMs through Communicative Agent Based Data Synthesis".☆17Feb 19, 2025Updated last year
- [EACL'23] MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages☆23Feb 13, 2023Updated 3 years ago
- ☆674Nov 1, 2024Updated last year
- CodeXGLUE☆1,815Apr 23, 2024Updated last year
- 🤖ConvRe🤯: An Investigation of LLMs’ Inefficacy in Understanding Converse Relations (EMNLP 2023)☆24Oct 10, 2023Updated 2 years ago