eth-nlped / mathdial
🧮 MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023
☆45Updated 8 months ago
Related projects ⓘ
Alternatives and complementary repositories for mathdial
- ☆80Updated last year
- Companion code for FanOutQA: Multi-Hop, Multi-Document Question Answering for Large Language Models (ACL 2024)☆37Updated last month
- Code for the paper Code for the paper InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning☆97Updated last year
- Code, datasets, and checkpoints for the paper "Improving Passage Retrieval with Zero-Shot Question Generation (EMNLP 2022)"☆96Updated last year
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆122Updated 8 months ago
- ☆26Updated last year
- [ACL 2024] LangBridge: Multilingual Reasoning Without Multilingual Supervision☆81Updated 3 weeks ago
- ☆29Updated last year
- ☆44Updated last year
- NAACL 2021: Are NLP Models really able to Solve Simple Math Word Problems?☆118Updated 2 years ago
- ☆83Updated last year
- First explanation metric (diagnostic report) for text generation evaluation☆61Updated 4 months ago
- 👻 Code and benchmark for our EMNLP 2023 paper - "FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions"☆51Updated 5 months ago
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"☆78Updated 3 months ago
- IntructIR, a novel benchmark specifically designed to evaluate the instruction following ability in information retrieval models. Our foc…☆28Updated 5 months ago
- ☆69Updated last year
- Code base of In-Context Learning for Dialogue State tracking☆44Updated last year
- [NeurIPS 2024] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messages☆37Updated last month
- [ICLR 2023] Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners☆111Updated 2 months ago
- Code and models for the paper "Questions Are All You Need to Train a Dense Passage Retriever (TACL 2023)"☆60Updated last year
- Benchmarking Commonsense Reasoning in Real-World Tasks☆13Updated 11 months ago
- ☆48Updated last year
- Code and data accompanying the paper "TRUE: Re-evaluating Factual Consistency Evaluation".☆71Updated this week
- BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval☆57Updated last month
- The official code of TACL 2021, "Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies".☆63Updated 2 years ago
- ☆23Updated last year
- Grade-School Math with Irrelevant Context (GSM-IC) benchmark is an arithmetic reasoning dataset built upon GSM8K, by adding irrelevant se…☆55Updated last year
- RARR: Researching and Revising What Language Models Say, Using Language Models☆43Updated last year
- [NAACL 2024] End-to-End Beam Retrieval for Multi-Hop Question Answering☆77Updated 7 months ago
- ☆94Updated 6 months ago