taisazero / socratic-debugging-benchmark
The repository contains the code and dataset for the Socratic Debugging task which is a novel task for Socratically Questioning Novice Debuggers to guide them towards discovering and fixing a buggy python program.
โ17Updated last year
Alternatives and similar repositories for socratic-debugging-benchmark:
Users that are interested in socratic-debugging-benchmark are comparing it to the libraries listed below
- ๐งฎ MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023โ50Updated 3 weeks ago
- โ41Updated 7 months ago
- NAACL 2024. Code & Dataset for "๐ Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakeโฆโ36Updated 8 months ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"โ54Updated last year
- ๐ป Code and benchmark for our EMNLP 2023 paper - "FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions"โ54Updated 10 months ago
- Supporting code for ReCEval paperโ28Updated 6 months ago
- Grade-School Math with Irrelevant Context (GSM-IC) benchmark is an arithmetic reasoning dataset built upon GSM8K, by adding irrelevant seโฆโ58Updated 2 years ago
- โ43Updated 9 months ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"โ73Updated last year
- โ20Updated 10 months ago
- A collection of works that investigate social agents, simulations and their real-world impact in text, embodied, and robotics contexts.โ83Updated 9 months ago
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"โ83Updated 7 months ago
- Generating diverse counterfactual data for Natural Language Understanding tasks using Large Language Models (LLMs). The generator supportโฆโ36Updated last year
- Code for the paper "REV: Information-Theoretic Evaluation of Free-Text Rationales"โ15Updated last year
- Language Models of Code are Few-Shot Commonsense Learners (EMNLP 2022)โ86Updated 2 years ago
- โ91Updated 10 months ago
- โ37Updated 4 months ago
- This repository includes code for the paper "Does Localization Inform Editing? Surprising Differences in Where Knowledge Is Stored vs. Caโฆโ59Updated last year
- [NeurIPS 2024] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messagesโ44Updated 3 months ago
- About The corresponding code from our paper " REFINER: Reasoning Feedback on Intermediate Representations" (EACL 2024). Do not hesitate tโฆโ70Updated last year
- Public repository for "Think Twice: Perspective-Taking Improves Large Language Modelsโ Theory-of-Mind Capabilities".โ17Updated last year
- This repo contains code for our NeurIPS 2023 spotlight paper: Evaluating and Inducing Personality in Pre-trained Language Modelsโ51Updated last year
- Code for ICLR 2024 paper "CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets"โ52Updated 9 months ago
- Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)โ36Updated 3 months ago
- โ47Updated last year
- [ICLR 2023] Code for our paper "Selective Annotation Makes Language Models Better Few-Shot Learners"โ109Updated last year
- Official repo for SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistencyโ35Updated 2 months ago
- Code/data for MARG (multi-agent review generation)โ41Updated 4 months ago
- Sotopia-ฯ: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)โ61Updated 10 months ago
- Evaluating the Moral Beliefs Encoded in LLMsโ24Updated 3 months ago