eth-nlped / mathdialLinks
๐งฎ MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023
โ67Updated last month
Alternatives and similar repositories for mathdial
Users that are interested in mathdial are comparing it to the libraries listed below
Sorting:
- โ189Updated 4 months ago
- NAACL 2021: Are NLP Models really able to Solve Simple Math Word Problems?โ134Updated 3 years ago
- Companion code for FanOutQA: Multi-Hop, Multi-Document Question Answering for Large Language Models (ACL 2024)โ55Updated last month
- RARR: Researching and Revising What Language Models Say, Using Language Modelsโ49Updated 2 years ago
- ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.โ149Updated 2 months ago
- โ79Updated last year
- ACL2023 - AlignScore, a metric for factual consistency evaluation.โ138Updated last year
- This repository contains the dataset and code for "WiCE: Real-World Entailment for Claims in Wikipedia" in EMNLP 2023.โ41Updated last year
- Awesome LLM for NLG Evaluation Papersโ25Updated last year
- โ38Updated 2 years ago
- Token-level Reference-free Hallucination Detectionโ96Updated 2 years ago
- The implementation of "RQUGE: Reference-Free Metric for Evaluating Question Generation by Answering the Question" [ACL 2023]โ16Updated last year
- Code and data associated with the AmbiEnt dataset in "We're Afraid Language Models Aren't Modeling Ambiguity" (Liu et al., 2023)โ64Updated last year
- ACL 2023: Evaluating Open-Domain Question Answering in the Era of Large Language Modelsโ47Updated last year
- Multilingual Large Language Models Evaluation Benchmarkโ132Updated last year
- A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomicโฆโ397Updated 6 months ago
- โ49Updated 2 years ago
- This code accompanies the paper DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering.โ16Updated 2 years ago
- HANNA, a large annotated dataset of Human-ANnotated NArratives for ASG evaluation.โ35Updated last year
- Synthetic question-answering dataset to formally analyze the chain-of-thought output of large language models on a reasoning task.โ150Updated last month
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answersโ135Updated last year
- โ116Updated last year
- Repository for the Bias Benchmark for QA dataset.โ129Updated last year
- โ293Updated last year
- [ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"โ77Updated last year
- Codes for papers on Large Language Models Personalization (LaMP)โ175Updated 8 months ago
- Inspecting and Editing Knowledge Representations in Language Modelsโ119Updated 2 years ago
- โ51Updated 2 years ago
- Code and models for the paper "Questions Are All You Need to Train a Dense Passage Retriever (TACL 2023)"โ62Updated 2 years ago
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)โ59Updated last year