taisazero / socratic-debugging-benchmarkLinks
The repository contains the code and dataset for the Socratic Debugging task which is a novel task for Socratically Questioning Novice Debuggers to guide them towards discovering and fixing a buggy python program.
โ19Updated last year
Alternatives and similar repositories for socratic-debugging-benchmark
Users that are interested in socratic-debugging-benchmark are comparing it to the libraries listed below
Sorting:
- NAACL 2024. Code & Dataset for "๐ Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakeโฆโ45Updated last year
- Code and data accompanying our paper on arXiv "Faithful Chain-of-Thought Reasoning".โ165Updated last year
- โ116Updated last year
- ๐งฎ MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023โ70Updated 3 months ago
- โ100Updated last year
- Inspecting and Editing Knowledge Representations in Language Modelsโ119Updated 2 years ago
- โ50Updated last year
- โ47Updated 2 months ago
- Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"โ60Updated 10 months ago
- This repository contains data, code and models for contextual noncompliance.โ24Updated last year
- Code, datasets, models for the paper "Automatic Evaluation of Attribution by Large Language Models"โ56Updated 2 years ago
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"โ86Updated last year
- A package to generate summaries of long-form text and evaluate the coherence of these summaries. Official package for our ICLR 2024 paperโฆโ128Updated last year
- Implementation of the paper: "Answering Questions by Meta-Reasoning over Multiple Chains of Thought"โ96Updated last year
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answersโ135Updated last year
- Data and code for the paper "Inducing Positive Perspectives with Text Reframing"โ61Updated 2 years ago
- Token-level Reference-free Hallucination Detectionโ97Updated 2 years ago
- About The corresponding code from our paper " REFINER: Reasoning Feedback on Intermediate Representations" (EACL 2024). Do not hesitate tโฆโ72Updated last year
- A set of utilities for running few-shot prompting experiments on large-language modelsโ126Updated 2 years ago
- Repository for research in the field of Responsible NLP at Meta.โ204Updated 7 months ago
- [ICLR 2023] Code for our paper "Selective Annotation Makes Language Models Better Few-Shot Learners"โ110Updated 2 years ago
- The GitHub repo for Goal Driven Discovery of Distributional Differences via Language Descriptionsโ71Updated 2 years ago
- โ47Updated last year
- The official code of TACL 2021, "Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies".โ81Updated 3 years ago
- The Prism Alignment Projectโ87Updated last year
- PASTA: Post-hoc Attention Steering for LLMsโ131Updated last year
- Repository for the Bias Benchmark for QA dataset.โ133Updated last year
- โ37Updated 4 months ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"โ54Updated last year
- โ173Updated 2 years ago