microsoft / DataScienceProblemsLinks
A repository containing the Jupyter notebook code generation benchmark.
☆61Updated 3 years ago
Alternatives and similar repositories for DataScienceProblems
Users that are interested in DataScienceProblems are comparing it to the libraries listed below
Sorting:
- Official code release for the paper Coder Reviewer Reranking for Code Generation.☆45Updated 2 years ago
- Code for generating the JuICe dataset.☆37Updated 3 years ago
- Interpretable and efficient predictors using pre-trained language models. Scikit-learn compatible.☆44Updated 7 months ago
- [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation☆49Updated last year
- ☆78Updated 6 months ago
- Code for paper "LEVER: Learning to Verifiy Language-to-Code Generation with Execution" (ICML'23)☆90Updated 2 years ago
- ☆49Updated last year
- Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"☆28Updated 3 years ago
- ☆54Updated 2 years ago
- Code for our paper: "GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models"☆57Updated 2 years ago
- Code for the NLP4Prog workshop paper "Reading StackOverflow Encourages Cheating: Adding Question TextImproves Extractive Code Generation"☆21Updated 4 years ago
- Web queries dataset for code search☆32Updated 2 years ago
- ☆38Updated 3 years ago
- ☆119Updated last year
- Google Research☆46Updated 2 years ago
- Code, datasets and results of the ChatGPT evaluation presented in paper "ChatGPT: Jack of all trades, master of none"☆29Updated 2 years ago
- A diff tool for language models☆44Updated last year
- A plugin for code generation in PyCharm/IntelliJ using tranX☆36Updated 2 weeks ago
- ☆29Updated last year
- Dataset and code for Findings of EMNLP'21 paper "CodeQA: A Question Answering Dataset for Source Code Comprehension".☆42Updated last year
- This project shows how to derive the total number of training tokens from a large text dataset from 🤗 datasets with Apache Beam and Data…☆27Updated 2 years ago
- Official implementation for the paper, StackEval: Benchmarking LLMs in Coding Assistance, https://arxiv.org/abs/2412.05288☆17Updated 11 months ago
- A categorical archive of ChatGPT failures☆64Updated 2 years ago
- ☆29Updated 2 years ago
- A unified benchmark for math reasoning☆88Updated 2 years ago
- Foundation Models for Data Tasks☆109Updated 2 years ago
- Evaluation suite for large-scale language models.☆128Updated 4 years ago
- Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"☆59Updated 8 months ago
- [NAACL 2024] Struc-Bench: Are Large Language Models Good at Generating Complex Structured Tabular Data? https://aclanthology.org/2024.naa…☆55Updated 2 months ago
- ☆54Updated 2 years ago