taisazero / socratic-debugging-benchmarkLinks
The repository contains the code and dataset for the Socratic Debugging task which is a novel task for Socratically Questioning Novice Debuggers to guide them towards discovering and fixing a buggy python program.
โ18Updated last year
Alternatives and similar repositories for socratic-debugging-benchmark
Users that are interested in socratic-debugging-benchmark are comparing it to the libraries listed below
Sorting:
- NAACL 2024. Code & Dataset for "๐ Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakeโฆโ45Updated last year
- โ116Updated last year
- Inspecting and Editing Knowledge Representations in Language Modelsโ119Updated 2 years ago
- Code and data accompanying our paper on arXiv "Faithful Chain-of-Thought Reasoning".โ165Updated last year
- โ51Updated last year
- Code, datasets, models for the paper "Automatic Evaluation of Attribution by Large Language Models"โ56Updated 2 years ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"โ100Updated 2 years ago
- Repository for the Bias Benchmark for QA dataset.โ134Updated 2 years ago
- [ICLR 2023] Code for our paper "Selective Annotation Makes Language Models Better Few-Shot Learners"โ109Updated 2 years ago
- Code and Data for the NAACL 24 paper: MacGyver: Are Large Language Models Creative Problem Solvers?โ29Updated last year
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"โ54Updated last year
- Data and code for the paper "Inducing Positive Perspectives with Text Reframing"โ61Updated 2 years ago
- Code for Arxiv 2023: Improving Language Model Negociation with Self-Play and In-Context Learning from AI Feedbackโ207Updated 2 years ago
- โ67Updated 3 years ago
- โ100Updated last year
- About The corresponding code from our paper " REFINER: Reasoning Feedback on Intermediate Representations" (EACL 2024). Do not hesitate tโฆโ74Updated last year
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answersโ136Updated last year
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"โ87Updated last year
- Token-level Reference-free Hallucination Detectionโ97Updated 2 years ago
- Source code for the paper "Active Prompting with Chain-of-Thought for Large Language Models"โ248Updated last year
- โ47Updated 3 months ago
- โ43Updated last year
- ๐งฎ MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023โ72Updated 4 months ago
- Dataset associated with "BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation" paperโ85Updated 4 years ago
- This is the code for the ICLR 2023 paper "Leveraging Large Language Models for Multiple Choice Question Answering."โ41Updated 2 years ago
- DialOp: Decision-oriented dialogue environments for collaborative language agentsโ111Updated last year
- Easy-to-use MIRAGE code for faithful answer attribution in RAG applications. Paper: https://aclanthology.org/2024.emnlp-main.347/โ26Updated 10 months ago
- PASTA: Post-hoc Attention Steering for LLMsโ133Updated last year
- The Prism Alignment Projectโ87Updated last year
- [NeurIPS 2023] Codebase for the paper: "Guiding Large Language Models with Directional Stimulus Prompting"โ111Updated 2 years ago