taisazero / socratic-debugging-benchmark
The repository contains the code and dataset for the Socratic Debugging task which is a novel task for Socratically Questioning Novice Debuggers to guide them towards discovering and fixing a buggy python program.
โ15Updated 10 months ago
Alternatives and similar repositories for socratic-debugging-benchmark:
Users that are interested in socratic-debugging-benchmark are comparing it to the libraries listed below
- ๐งฎ MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023โ47Updated 11 months ago
- NAACL 2024. Code & Dataset for "๐ Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakeโฆโ32Updated 7 months ago
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"โ83Updated 6 months ago
- Code for the paper "REV: Information-Theoretic Evaluation of Free-Text Rationales"โ15Updated last year
- Codes and Datasets for our ACL 2023 paper on cognitive reframing of negative thoughtsโ57Updated last year
- Apps built using Inspired Cognition's Critique.โ58Updated last year
- โ31Updated last year
- Companion code for FanOutQA: Multi-Hop, Multi-Document Question Answering for Large Language Models (ACL 2024)โ46Updated 4 months ago
- Edu-ConvoKit: An Open-Source Framework for Education Conversation Dataโ82Updated 6 months ago
- Implementation of the Paper "Goal-Driven Explainable Clustering via Language Descriptions"โ36Updated last year
- A collection of works that investigate social agents, simulations and their real-world impact in text, embodied, and robotics contexts.โ79Updated 8 months ago
- โ104Updated 9 months ago
- Grade-School Math with Irrelevant Context (GSM-IC) benchmark is an arithmetic reasoning dataset built upon GSM8K, by adding irrelevant seโฆโ58Updated 2 years ago
- The official code of TACL 2021, "Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies".โ66Updated 2 years ago
- Technical Report: Is ChatGPT a Good NLG Evaluator? A Preliminary Studyโ43Updated last year
- Code/data for MARG (multi-agent review generation)โ38Updated 3 months ago
- Code and data for paper "Context-faithful Prompting for Large Language Models".โ39Updated last year
- A collection of research papers related to Natural Language Reasoningโ11Updated 2 years ago
- [NeurIPS 2024] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messagesโ42Updated 2 months ago
- โ50Updated last year
- Generating diverse counterfactual data for Natural Language Understanding tasks using Large Language Models (LLMs). The generator supportโฆโ35Updated last year
- โ85Updated last year
- Token-level Reference-free Hallucination Detectionโ94Updated last year
- Supporting code for ReCEval paperโ28Updated 5 months ago
- A Large-Scale Dataset for Empathetic Response Generationโ41Updated 10 months ago
- โ47Updated last year
- โ40Updated 9 months ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"โ54Updated 11 months ago
- Code for "From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Modโฆโ33Updated 11 months ago
- Easy-to-use MIRAGE code for faithful answer attribution in RAG applications. Paper: https://aclanthology.org/2024.emnlp-main.347/โ21Updated 2 months ago