google-research-datasets / answer-equivalence-dataset
This dataset contains human judgements about answer equivalence. The data is based on SQuAD (Stanford Question Answering Dataset), and contains 9k human judgements of answer candidates generated by Albert on the SQuAD train set, and an additional 14k human judgements for answer candidates produced by BiDAF, Luke, and XLNet on the SQuAD dev set.
☆22Updated 2 years ago
Alternatives and similar repositories for answer-equivalence-dataset:
Users that are interested in answer-equivalence-dataset are comparing it to the libraries listed below
- Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"☆26Updated 3 years ago
- ☆28Updated last year
- EMNLP 2021 Tutorial: Multi-Domain Multilingual Question Answering☆38Updated 3 years ago
- ReConsider is a re-ranking model that re-ranks the top-K (passage, answer-span) predictions of an Open-Domain QA Model like DPR (Karpukhi…☆49Updated 3 years ago
- ACL 2021 paper "Style is NOT a single variable: Case Studies for Cross-Style Language Understanding " by Dongyeop Kang and Eduard Hovy☆13Updated 3 years ago
- Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"☆30Updated 2 years ago
- ☆40Updated 3 years ago
- Code to support the paper "Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets"☆66Updated 3 years ago
- An original implementation of the paper "CREPE: Open-Domain Question Answering with False Presuppositions"☆14Updated 2 months ago
- This is a repository for the paper on testing inductive bias with scaled-down RoBERTa models.☆20Updated 3 years ago
- The official implementation for ACL 2021 "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval".☆27Updated 3 years ago
- ☆25Updated last year
- The official implemetation of "Evidentiality-guided Generation for Knowledge-Intensive NLP Tasks" (NAACL 2022).☆43Updated 2 years ago
- Code Repo for the ACL21 paper "Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning"☆22Updated 3 years ago
- Code & data for EMNLP 2020 paper "MOCHA: A Dataset for Training and Evaluating Reading Comprehension Metrics".☆16Updated 2 years ago
- ☆15Updated 3 years ago
- Code and dataset for the EMNLP 2021 Finding paper "Can NLI Models Verify QA Systems’ Predictions?"☆25Updated last year
- Faithfulness and factuality annotations of XSum summaries from our paper "On Faithfulness and Factuality in Abstractive Summarization" (h…☆81Updated 4 years ago
- FRANK: Factuality Evaluation Benchmark☆52Updated 2 years ago
- Pretraining scripts for BART transformer model☆11Updated last year
- ☆92Updated 3 years ago
- This is the official repository for NAACL 2021, "XOR QA: Cross-lingual Open-Retrieval Question Answering".☆79Updated 3 years ago
- Code repo for SIGIR 2021 paper "Few-Shot Conversational Dense Retrieval"☆41Updated 3 years ago
- ☆67Updated 3 years ago
- ☆24Updated last year
- REALSumm: Re-evaluating Evaluation in Text Summarization☆71Updated 2 years ago
- ☆13Updated 2 years ago
- We are creating a challenging new benchmark MultiReQA: A Cross-Domain Evaluation for Retrieval Question Answering Models. Retrieval quest…☆31Updated 4 years ago
- NLQuAD: A Non-Factoid Long Question Answering Data Set. To be published at EACL2021☆14Updated 3 years ago
- Pre-training BART in Flax on The Pile dataset☆20Updated 3 years ago