taisazero / socratic-debugging-benchmark
The repository contains the code and dataset for the Socratic Debugging task which is a novel task for Socratically Questioning Novice Debuggers to guide them towards discovering and fixing a buggy python program.
โ13Updated 10 months ago
Alternatives and similar repositories for socratic-debugging-benchmark:
Users that are interested in socratic-debugging-benchmark are comparing it to the libraries listed below
- โ90Updated 7 months ago
- NAACL 2024. Code & Dataset for "๐ Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakeโฆโ31Updated 6 months ago
- Codes and Datasets for our ACL 2023 paper on cognitive reframing of negative thoughtsโ56Updated last year
- ๐งฎ MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023โ45Updated 10 months ago
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"โ82Updated 5 months ago
- Edu-ConvoKit: An Open-Source Framework for Education Conversation Dataโ82Updated 5 months ago
- Code for the paper "REV: Information-Theoretic Evaluation of Free-Text Rationales"โ15Updated last year
- PATIENT-ฮจ: Using Large Language Models to Simulate Patients for Training Mental Health Professionals (EMNLP 2024)โ52Updated 2 months ago
- โ38Updated 7 months ago
- โ38Updated 5 months ago
- IntructIR, a novel benchmark specifically designed to evaluate the instruction following ability in information retrieval models. Our focโฆโ31Updated 7 months ago
- [NeurIPS 2024] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messagesโ41Updated last month
- Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"โ59Updated 9 months ago
- Easy-to-use MIRAGE code for faithful answer attribution in RAG applications. Paper: https://aclanthology.org/2024.emnlp-main.347/โ21Updated last month
- Code for "Democratizing Reasoning Ability: Tailored Learning from Large Language Model", EMNLP 2023โ31Updated last year
- The LM Contamination Index is a manually created database of contamination evidences for LMs.โ76Updated 9 months ago
- Supporting code for ReCEval paperโ27Updated 4 months ago
- The official code of TACL 2021, "Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies".โ65Updated 2 years ago
- A Computational Framework for Behavioral Assessment of LLM Therapistsโ24Updated 3 months ago
- A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets.โ15Updated last year
- โ23Updated last year
- โ85Updated last year
- โ17Updated 3 months ago
- โ33Updated last year
- Zero-shot evaluation on LEXGLUE tasks with GTP3.5โ27Updated last year
- โ20Updated last year
- This repo contains code for the paper "Psychologically-informed chain-of-thought prompts for metaphor understanding in large language modโฆโ14Updated last year
- โ52Updated last year
- Code and data for paper "Context-faithful Prompting for Large Language Models".โ39Updated last year
- Code for ICLR 2024 paper "CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets"โ50Updated 7 months ago