eth-nlped / mathdial
🧮 MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023
☆50Updated 3 weeks ago
Alternatives and similar repositories for mathdial:
Users that are interested in mathdial are comparing it to the libraries listed below
- Companion code for FanOutQA: Multi-Hop, Multi-Document Question Answering for Large Language Models (ACL 2024)☆51Updated 3 weeks ago
- Code and data associated with the AmbiEnt dataset in "We're Afraid Language Models Aren't Modeling Ambiguity" (Liu et al., 2023)☆60Updated last year
- ☆82Updated 2 years ago
- 👻 Code and benchmark for our EMNLP 2023 paper - "FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions"☆53Updated 9 months ago
- ☆50Updated last year
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"☆83Updated 7 months ago
- [arXiv preprint] Official Repository for "Evaluating Language Models as Synthetic Data Generators"☆34Updated 3 months ago
- ☆28Updated 2 years ago
- First explanation metric (diagnostic report) for text generation evaluation☆62Updated 3 weeks ago
- ☆45Updated 2 years ago
- Codes for ACL 2023 Paper "Fact-Checking Complex Claims with Program-Guided Reasoning"☆30Updated last year
- This code accompanies the paper DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering.☆17Updated 2 years ago
- [ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"☆67Updated 11 months ago
- This repository contains the dataset and code for "WiCE: Real-World Entailment for Claims in Wikipedia" in EMNLP 2023.☆41Updated last year
- Code base of In-Context Learning for Dialogue State tracking☆45Updated last year
- Code for Editing Factual Knowledge in Language Models☆136Updated 3 years ago
- ☆32Updated last week
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆126Updated last year
- Code for the paper Code for the paper InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning☆99Updated last year
- IntructIR, a novel benchmark specifically designed to evaluate the instruction following ability in information retrieval models. Our foc…☆31Updated 9 months ago
- Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"☆59Updated 2 months ago
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)☆58Updated last year
- FRANK: Factuality Evaluation Benchmark☆54Updated 2 years ago
- ☆86Updated last year
- ☆20Updated 2 years ago
- RARR: Researching and Revising What Language Models Say, Using Language Models☆46Updated last year
- Inspecting and Editing Knowledge Representations in Language Models☆112Updated last year
- Code for the paper "REV: Information-Theoretic Evaluation of Free-Text Rationales"☆15Updated last year
- Code and models for the paper "Questions Are All You Need to Train a Dense Passage Retriever (TACL 2023)"☆62Updated 2 years ago
- The official code of TACL 2021, "Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies".☆69Updated 2 years ago