taisazero / socratic-debugging-benchmark
The repository contains the code and dataset for the Socratic Debugging task which is a novel task for Socratically Questioning Novice Debuggers to guide them towards discovering and fixing a buggy python program.
โ13Updated 7 months ago
Related projects โ
Alternatives and complementary repositories for socratic-debugging-benchmark
- โ86Updated 5 months ago
- NAACL 2024. Code & Dataset for "๐ Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakeโฆโ29Updated 4 months ago
- Edu-ConvoKit: An Open-Source Framework for Education Conversation Dataโ77Updated 3 months ago
- Code for the paper "REV: Information-Theoretic Evaluation of Free-Text Rationales"โ14Updated last year
- A collection of works that investigate social agents, simulations and their real-world impact in text, embodied, and robotics contexts.โ63Updated 5 months ago
- Apps built using Inspired Cognition's Critique.โ58Updated last year
- โ35Updated last year
- ๐ป Code and benchmark for our EMNLP 2023 paper - "FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions"โ51Updated 5 months ago
- Evaluating the Moral Beliefs Encoded in LLMsโ21Updated 9 months ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"โ62Updated last year
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"โ49Updated 8 months ago
- Tasks for describing differences between text distributions.โ16Updated 3 months ago
- Language Models of Code are Few-Shot Commonsense Learners (EMNLP 2022)โ86Updated last year
- A set of utilities for running few-shot prompting experiments on large-language modelsโ113Updated last year
- ๐งฎ MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023โ45Updated 8 months ago
- [NeurIPS 2024] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messagesโ37Updated last month
- Zero-shot evaluation on LEXGLUE tasks with GTP3.5โ27Updated last year
- Source code and data for The Magic of IF: Investigating Causal Reasoning Abilities in Large Language Models of Code (Findings of ACL 2023โฆโ29Updated last year
- โ22Updated last year
- โ25Updated last week
- โ36Updated 3 months ago
- Implementation of the Paper "Goal-Driven Explainable Clustering via Language Descriptions"โ35Updated last year
- [EACL 2023] CoTEVer: Chain of Thought Prompting Annotation Toolkit for Explanation Verificationโ38Updated last year
- โ33Updated last year
- A Computational Framework for Behavioral Assessment of LLM Therapistsโ22Updated last month
- The Prism Alignment Projectโ37Updated 6 months ago
- Data and code for the paper "Inducing Positive Perspectives with Text Reframing"โ54Updated last year
- Supporting code for ReCEval paperโ26Updated 2 months ago
- โ36Updated 5 months ago
- Sotopia-ฯ: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)โ50Updated 6 months ago