taisazero / socratic-debugging-benchmark
The repository contains the code and dataset for the Socratic Debugging task which is a novel task for Socratically Questioning Novice Debuggers to guide them towards discovering and fixing a buggy python program.
☆11Updated 5 months ago
Related projects: ⓘ
- Codes and Datasets for our ACL 2023 paper on cognitive reframing of negative thoughts☆51Updated last year
- Byte-sized text games for code generation tasks on virtual environments☆17Updated 2 months ago
- ☆81Updated 3 months ago
- Code & data for EMNLP 2020 paper "MOCHA: A Dataset for Training and Evaluating Reading Comprehension Metrics".☆16Updated 2 years ago
- [EACL 2023] CoTEVer: Chain of Thought Prompting Annotation Toolkit for Explanation Verification☆35Updated last year
- ☆23Updated 2 weeks ago
- ☆44Updated 2 months ago
- Apps built using Inspired Cognition's Critique.☆58Updated last year
- DialOp: Decision-oriented dialogue environments for collaborative language agents☆97Updated 2 months ago
- ☆31Updated 3 months ago
- Tasks for describing differences between text distributions.☆15Updated last month
- LLM Dynamic Planner - Combining LLM with PDDL Planners to solve an embodied task☆33Updated last week
- Repo for: When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment☆37Updated last year
- ☆16Updated 6 months ago
- ☆28Updated last year
- ☆30Updated last year
- Language Models of Code are Few-Shot Commonsense Learners (EMNLP 2022)☆85Updated last year
- [ACL 2024 NLP4ConvAI Oral] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system m…☆33Updated 3 months ago
- Code and Dataset for Learning to Solve Complex Tasks by Talking to Agents☆21Updated 2 years ago
- HANNA, a large annotated dataset of Human-ANnotated NArratives for ASG evaluation.☆25Updated last month
- ☆22Updated this week
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆59Updated 10 months ago
- Code/data for MARG (multi-agent review generation)☆24Updated 4 months ago
- This repo contains code for the paper "Psychologically-informed chain-of-thought prompts for metaphor understanding in large language mod…☆12Updated last year
- [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation☆42Updated 8 months ago
- Code, datasets, models for the paper "Automatic Evaluation of Attribution by Large Language Models"☆51Updated last year
- A Computational Framework for Behavioral Assessment of LLM Therapists☆18Updated 7 months ago
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆40Updated 8 months ago
- Code and Data for the NAACL 24 paper: MacGyver: Are Large Language Models Creative Problem Solvers?☆21Updated 5 months ago
- ☆23Updated last year