taisazero / socratic-debugging-benchmarkLinks
The repository contains the code and dataset for the Socratic Debugging task which is a novel task for Socratically Questioning Novice Debuggers to guide them towards discovering and fixing a buggy python program.
โ18Updated last year
Alternatives and similar repositories for socratic-debugging-benchmark
Users that are interested in socratic-debugging-benchmark are comparing it to the libraries listed below
Sorting:
- NAACL 2024. Code & Dataset for "๐ Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakeโฆโ41Updated 11 months ago
- ๐งฎ MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023โ55Updated 3 months ago
- โ44Updated 9 months ago
- โ42Updated last year
- Evaluate the Quality of Critiqueโ35Updated last year
- Generating diverse counterfactual data for Natural Language Understanding tasks using Large Language Models (LLMs). The generator supportโฆโ37Updated last year
- This repository includes code for the paper "Does Localization Inform Editing? Surprising Differences in Where Knowledge Is Stored vs. Caโฆโ61Updated 2 years ago
- โ43Updated 10 months ago
- Inspecting and Editing Knowledge Representations in Language Modelsโ116Updated last year
- Code/data for MARG (multi-agent review generation)โ44Updated 7 months ago
- ๐ป Code and benchmark for our EMNLP 2023 paper - "FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions"โ55Updated last year
- Grade-School Math with Irrelevant Context (GSM-IC) benchmark is an arithmetic reasoning dataset built upon GSM8K, by adding irrelevant seโฆโ60Updated 2 years ago
- โ66Updated 3 years ago
- Code and Data for the NAACL 24 paper: MacGyver: Are Large Language Models Creative Problem Solvers?โ28Updated last year
- Code and data associated with the AmbiEnt dataset in "We're Afraid Language Models Aren't Modeling Ambiguity" (Liu et al., 2023)โ64Updated last year
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"โ85Updated 10 months ago
- Code of ICLR paper: https://openreview.net/forum?id=-cqvvvb-NkIโ94Updated 2 years ago
- Supporting code for ReCEval paperโ28Updated 9 months ago
- Repo for: When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgmentโ38Updated 2 years ago
- โ28Updated last year
- โ25Updated last year
- This repository contains data, code and models for contextual noncompliance.โ23Updated 11 months ago
- Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"โ70Updated last year
- Dataset and code for Findings of EMNLP'21 paper "CodeQA: A Question Answering Dataset for Source Code Comprehension".โ40Updated last year
- โ106Updated last year
- โ95Updated last year
- Evaluating the Moral Beliefs Encoded in LLMsโ26Updated 6 months ago
- Tasks for describing differences between text distributions.โ16Updated 10 months ago
- Code for the ACL 2023 long paper - Expand, Rerank, and Retrieve: Query Reranking for Open-Domain Question Answeringโ37Updated 2 years ago
- โ21Updated last year