TyDi QA contains 200k human-annotated question-answer pairs in 11 Typologically Diverse languages, written without seeing the answer and without the use of translation, and is designed for the training and evaluation of automatic question answering systems. This repository provides evaluation code and a baseline system for the dataset.
☆317May 28, 2020Updated 5 years ago
Alternatives and similar repositories for tydiqa
Users that are interested in tydiqa are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- New dataset☆311Aug 31, 2021Updated 4 years ago
- This is the official repository for NAACL 2021, "XOR QA: Cross-lingual Open-Retrieval Question Answering".☆80Jun 3, 2021Updated 4 years ago
- ReConsider is a re-ranking model that re-ranks the top-K (passage, answer-span) predictions of an Open-Domain QA Model like DPR (Karpukhi…☆49Apr 26, 2021Updated 4 years ago
- Dataset and baseline for ACL 2019 paper "XQA: A Cross-lingual Open-domain Question Answering Dataset"☆89Nov 16, 2021Updated 4 years ago
- ☆208Nov 12, 2021Updated 4 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Natural Questions (NQ) contains real user questions issued to Google search, and answers found from Wikipedia by annotators. NQ is design…☆1,113Jul 30, 2021Updated 4 years ago
- XTREME is a benchmark for the evaluation of the cross-lingual generalization ability of pre-trained multilingual models that covers 40 ty…☆652Jan 4, 2023Updated 3 years ago
- Mr. TyDi is a multi-lingual benchmark dataset built on TyDi, covering eleven typologically diverse languages.☆80Feb 16, 2022Updated 4 years ago
- Progressively Pretrained Dense Corpus Index for Open-Domain QA and Information Retrieval☆43Jun 12, 2023Updated 2 years ago
- This dataset contains human judgements about answer equivalence. The data is based on SQuAD (Stanford Question Answering Dataset), and co…☆28Oct 24, 2022Updated 3 years ago
- We are creating a challenging new benchmark MultiReQA: A Cross-Domain Evaluation for Retrieval Question Answering Models. Retrieval quest…☆31Jul 9, 2020Updated 5 years ago
- The official implementation for ACL 2021 "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval".☆28Jun 19, 2021Updated 4 years ago
- We introduce MKQA, an open-domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically …☆192Jun 16, 2022Updated 3 years ago
- This dataset contains 108,463 human-labeled and 656k noisily labeled pairs that feature the importance of modeling structure, context, an…☆566Jan 4, 2022Updated 4 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- QED: A Framework and Dataset for Explanations in Question Answering☆119Aug 3, 2021Updated 4 years ago
- Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)☆393Nov 7, 2023Updated 2 years ago
- Library for Knowledge Intensive Language Tasks☆972Mar 31, 2022Updated 4 years ago
- Binary Passage Retriever (BPR) - an efficient passage retriever for open-domain question answering☆175Jun 6, 2021Updated 4 years ago
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators☆2,373Mar 23, 2024Updated 2 years ago
- ☆1,296Dec 15, 2022Updated 3 years ago
- DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue☆286Jul 6, 2023Updated 2 years ago
- Dense Passage Retriever - is a set of tools and models for open domain Q&A task.☆1,863Apr 6, 2023Updated 3 years ago
- Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.☆1,754Dec 20, 2023Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- This is the official implementation of NeurIPS 2021 "One Question Answering Model for Many Languages with Cross-lingual Dense Passage Ret…☆71Apr 1, 2022Updated 4 years ago
- UnifiedQA: Crossing Format Boundaries With a Single QA System☆445May 9, 2022Updated 3 years ago
- An original implementation of EMNLP 2020, "AmbigQA: Answering Ambiguous Open-domain Questions"☆121Apr 23, 2022Updated 3 years ago
- A BART version of an open-domain QA model in a closed-book setup☆119Aug 13, 2020Updated 5 years ago
- Shared repository for open-sourced projects from the Google AI Language team.☆1,765Mar 25, 2026Updated 2 weeks ago
- Neural text-to-text question generation☆216Nov 13, 2020Updated 5 years ago
- This repository contains the source code and links to some datasets used in the CoNLL 2019 paper "Learning to Represent Bilingual Diction…☆12Oct 1, 2020Updated 5 years ago
- The official implementation of ICLR 2020, "Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering".☆437Jul 25, 2024Updated last year
- LAnguage Model Analysis☆1,389Jul 7, 2024Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Code and datasets of "Multilingual Extractive Reading Comprehension by Runtime Machine Translation"☆40Jan 2, 2019Updated 7 years ago
- Code associated with the Don't Stop Pretraining ACL 2020 paper☆541Nov 15, 2021Updated 4 years ago
- ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: giv…☆462Sep 11, 2024Updated last year
- Python-based implementation of the Translate-Align-Retrieve method to automatically translate the SQuAD Dataset to Spanish.☆59Dec 8, 2022Updated 3 years ago
- Evaluation framework for open-domain question answering.☆20May 16, 2021Updated 4 years ago
- A Corpus for Multilingual Document Classification in Eight Languages.☆153Jun 6, 2022Updated 3 years ago
- BLEURT is a metric for Natural Language Generation based on transfer learning.☆790Aug 4, 2023Updated 2 years ago