eth-nlped / mathdialLinks
๐งฎ MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023
โ55Updated 3 months ago
Alternatives and similar repositories for mathdial
Users that are interested in mathdial are comparing it to the libraries listed below
Sorting:
- NAACL 2024. Code & Dataset for "๐ Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakeโฆโ41Updated 11 months ago
- Companion code for FanOutQA: Multi-Hop, Multi-Document Question Answering for Large Language Models (ACL 2024)โ53Updated last month
- First explanation metric (diagnostic report) for text generation evaluationโ62Updated 3 months ago
- Inspecting and Editing Knowledge Representations in Language Modelsโ116Updated last year
- This repository contains the dataset and code for "WiCE: Real-World Entailment for Claims in Wikipedia" in EMNLP 2023.โ41Updated last year
- RARR: Researching and Revising What Language Models Say, Using Language Modelsโ47Updated 2 years ago
- โ18Updated last year
- โ82Updated 2 years ago
- HANNA, a large annotated dataset of Human-ANnotated NArratives for ASG evaluation.โ34Updated 8 months ago
- Awesome LLM for NLG Evaluation Papersโ24Updated last year
- [ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"โ69Updated last year
- ๐ป Code and benchmark for our EMNLP 2023 paper - "FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions"โ55Updated last year
- Token-level Reference-free Hallucination Detectionโ94Updated last year
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answersโ130Updated last year
- Multilingual Large Language Models Evaluation Benchmarkโ124Updated 10 months ago
- NAACL 2021: Are NLP Models really able to Solve Simple Math Word Problems?โ130Updated 2 years ago
- โ22Updated 3 years ago
- A unified benchmark for math reasoningโ88Updated 2 years ago
- [arXiv preprint] Official Repository for "Evaluating Language Models as Synthetic Data Generators"โ33Updated 6 months ago
- Easy-to-use MIRAGE code for faithful answer attribution in RAG applications. Paper: https://aclanthology.org/2024.emnlp-main.347/โ24Updated 3 months ago
- โ28Updated last year
- Code and data for paper "Context-faithful Prompting for Large Language Models".โ40Updated 2 years ago
- Grade-School Math with Irrelevant Context (GSM-IC) benchmark is an arithmetic reasoning dataset built upon GSM8K, by adding irrelevant seโฆโ60Updated 2 years ago
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)โ59Updated last year
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"โ84Updated 10 months ago
- [ACL 2024] LangBridge: Multilingual Reasoning Without Multilingual Supervisionโ89Updated 7 months ago
- Code and dataset for the paper: Generating Literal and Implied Subquestions to Fact-check Complex Claimsโ26Updated 2 years ago
- โ15Updated 2 years ago
- A comprehensive paper list of Reasoning over Tables.โ29Updated 2 years ago
- โ106Updated last year