taisazero / socratic-debugging-benchmarkLinks
The repository contains the code and dataset for the Socratic Debugging task which is a novel task for Socratically Questioning Novice Debuggers to guide them towards discovering and fixing a buggy python program.
โ19Updated last year
Alternatives and similar repositories for socratic-debugging-benchmark
Users that are interested in socratic-debugging-benchmark are comparing it to the libraries listed below
Sorting:
- NAACL 2024. Code & Dataset for "๐ Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakeโฆโ45Updated last year
- The official repo for SocKET: Social Knowledge Evaluation Testsโ24Updated 7 months ago
- ๐งฎ MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023โ71Updated 3 months ago
- Code and data accompanying our paper on arXiv "Faithful Chain-of-Thought Reasoning".โ165Updated last year
- Codes and Datasets for our ACL 2023 paper on cognitive reframing of negative thoughtsโ66Updated 2 years ago
- Data and code for the paper "Inducing Positive Perspectives with Text Reframing"โ61Updated 2 years ago
- Inspecting and Editing Knowledge Representations in Language Modelsโ119Updated 2 years ago
- โ100Updated last year
- โ50Updated last year
- โ116Updated last year
- ๐ป Code and benchmark for our EMNLP 2023 paper - "FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions"โ57Updated last year
- Token-level Reference-free Hallucination Detectionโ97Updated 2 years ago
- [ICLR 2023] Code for our paper "Selective Annotation Makes Language Models Better Few-Shot Learners"โ110Updated 2 years ago
- Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"โ61Updated 11 months ago
- This repo contains code for our NeurIPS 2023 spotlight paper: Evaluating and Inducing Personality in Pre-trained Language Modelsโ56Updated 2 years ago
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"โ86Updated last year
- [NeurIPS 2023] Codebase for the paper: "Guiding Large Language Models with Directional Stimulus Prompting"โ112Updated 2 years ago
- โ29Updated 3 years ago
- โ47Updated 2 months ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answersโ136Updated last year
- Language Models of Code are Few-Shot Commonsense Learners (EMNLP 2022)โ86Updated 2 years ago
- โ88Updated 2 years ago
- Code and Data for the NAACL 24 paper: MacGyver: Are Large Language Models Creative Problem Solvers?โ29Updated last year
- Code for Arxiv 2023: Improving Language Model Negociation with Self-Play and In-Context Learning from AI Feedbackโ208Updated 2 years ago
- An Evaluation Taxonomy for Pedagogical Ability Assessment of LLM-Powered AI Tutorsโ24Updated last week
- About The corresponding code from our paper " REFINER: Reasoning Feedback on Intermediate Representations" (EACL 2024). Do not hesitate tโฆโ72Updated last year
- The official repository of "ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models"โ45Updated 2 years ago
- A Computational Framework for Behavioral Assessment of LLM Therapistsโ37Updated last year
- A package to generate summaries of long-form text and evaluate the coherence of these summaries. Official package for our ICLR 2024 paperโฆโ128Updated last year
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"โ54Updated last year