taisazero / socratic-debugging-benchmark
The repository contains the code and dataset for the Socratic Debugging task which is a novel task for Socratically Questioning Novice Debuggers to guide them towards discovering and fixing a buggy python program.
โ15Updated last year
Alternatives and similar repositories for socratic-debugging-benchmark:
Users that are interested in socratic-debugging-benchmark are comparing it to the libraries listed below
- NAACL 2024. Code & Dataset for "๐ Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakeโฆโ35Updated 8 months ago
- Code for the paper "REV: Information-Theoretic Evaluation of Free-Text Rationales"โ15Updated last year
- ๐งฎ MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023โ50Updated 3 weeks ago
- [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generationโ47Updated last year
- Code, datasets, models for the paper "Automatic Evaluation of Attribution by Large Language Models"โ56Updated last year
- Dataset and code for Findings of EMNLP'21 paper "CodeQA: A Question Answering Dataset for Source Code Comprehension".โ42Updated last year
- EMNLP 2022: "MABEL: Attenuating Gender Bias using Textual Entailment Data" https://arxiv.org/abs/2210.14975โ37Updated last year
- Token-level Reference-free Hallucination Detectionโ94Updated last year
- โ44Updated last year
- Code for "From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Modโฆโ35Updated last year
- Code and data associated with the AmbiEnt dataset in "We're Afraid Language Models Aren't Modeling Ambiguity" (Liu et al., 2023)โ60Updated last year
- Evaluating the Moral Beliefs Encoded in LLMsโ24Updated 3 months ago
- Data and code for the paper "Inducing Positive Perspectives with Text Reframing"โ57Updated last year
- This is the code for the ICLR 2023 paper "Leveraging Large Language Models for Multiple Choice Question Answering."โ39Updated 2 years ago
- Code and Data for the NAACL 24 paper: MacGyver: Are Large Language Models Creative Problem Solvers?โ26Updated last year
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"โ54Updated last year
- About The corresponding code from our paper " REFINER: Reasoning Feedback on Intermediate Representations" (EACL 2024). Do not hesitate tโฆโ70Updated last year
- datasets from the paper "Towards Understanding Sycophancy in Language Models"โ73Updated last year
- โ43Updated 9 months ago
- โ82Updated 2 years ago
- A dataset of over 10000 question and answer pairs written for storybooks.โ36Updated last year
- Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)โ36Updated 3 months ago
- โ11Updated last year
- โ91Updated 10 months ago
- Companion code for FanOutQA: Multi-Hop, Multi-Document Question Answering for Large Language Models (ACL 2024)โ51Updated 3 weeks ago
- Code and data for the FACTOR paperโ44Updated last year
- A Computational Framework for Behavioral Assessment of LLM Therapistsโ26Updated 5 months ago
- Technical Report: Is ChatGPT a Good NLG Evaluator? A Preliminary Studyโ43Updated 2 years ago
- Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"โ59Updated 2 months ago
- Repo for: When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgmentโ38Updated last year