taisazero / socratic-debugging-benchmarkLinks
The repository contains the code and dataset for the Socratic Debugging task which is a novel task for Socratically Questioning Novice Debuggers to guide them towards discovering and fixing a buggy python program.
☆18Updated last year
Alternatives and similar repositories for socratic-debugging-benchmark
Users that are interested in socratic-debugging-benchmark are comparing it to the libraries listed below
Sorting:
- NAACL 2024. Code & Dataset for "🌁 Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistake…☆43Updated last year
- Codes and Datasets for our ACL 2023 paper on cognitive reframing of negative thoughts☆64Updated last year
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"☆86Updated last year
- Language Models of Code are Few-Shot Commonsense Learners (EMNLP 2022)☆86Updated 2 years ago
- Code, datasets, models for the paper "Automatic Evaluation of Attribution by Large Language Models"☆56Updated 2 years ago
- ☆95Updated last year
- [ICLR 2023] Code for our paper "Selective Annotation Makes Language Models Better Few-Shot Learners"☆109Updated 2 years ago
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆81Updated last year
- ☆43Updated last year
- A set of utilities for running few-shot prompting experiments on large-language models☆122Updated last year
- DialOp: Decision-oriented dialogue environments for collaborative language agents☆109Updated 9 months ago
- About The corresponding code from our paper " REFINER: Reasoning Feedback on Intermediate Representations" (EACL 2024). Do not hesitate t…☆70Updated last year
- About The corresponding code from our paper " Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning…☆12Updated 11 months ago
- PACIFIC: Towards Proactive Conversational Question Answering over Tabular and Textual Data in Finance☆14Updated last year
- [NeurIPS 2023] PyTorch code for Can Language Models Teach? Teacher Explanations Improve Student Performance via Theory of Mind☆66Updated last year
- Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)☆37Updated 7 months ago
- Byte-sized text games for code generation tasks on virtual environments☆19Updated last year
- ☆53Updated last year
- Token-level Reference-free Hallucination Detection☆96Updated 2 years ago
- Data and code for the paper "Inducing Positive Perspectives with Text Reframing"☆61Updated 2 years ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated last year
- The data and the PyTorch implementation for the models and experiments in the paper "Exploiting Asymmetry for Synthetic Training Data Gen…☆63Updated 2 years ago
- Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"☆59Updated 6 months ago
- Benchmarking Generalization to New Tasks from Natural Language Instructions☆26Updated 4 years ago
- This repo contains code for the paper "Psychologically-informed chain-of-thought prompts for metaphor understanding in large language mod…☆14Updated 2 years ago
- SummScreen: A Dataset for Abstractive Screenplay Summarization (ACL 2022)☆37Updated 3 years ago
- The Synthetic-Persona-Chat dataset is a synthetically generated persona-based dialogue dataset. It extends the original Persona-Chat data…☆98Updated last year
- [NeurIPS 2023 Main Track] This is the repository for the paper titled "Don’t Stop Pretraining? Make Prompt-based Fine-tuning Powerful Lea…☆74Updated last year
- ☆21Updated 3 months ago
- EQUATE (Evaluating Quantitative Understanding Aptitude in Textual Entailment), framework for evaluating quantitative reasoning ability in…☆14Updated 3 years ago