A collection of large question answering datasets
☆433Jul 1, 2024Updated last year
Alternatives and similar repositories for large-qa-datasets
Users that are interested in large-qa-datasets are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Repository for "Turkish Wikipedia Based Knowledge Graph (Vikipedi Tabanlı Türkçe Bilgi Çizgesi)" of inzva AI Projects #6☆28May 29, 2021Updated 4 years ago
- ☆149Aug 21, 2023Updated 2 years ago
- Align, a general text alignment function☆15Dec 7, 2023Updated 2 years ago
- Enhancing Complex Question Answering over Knowledge Graphs through Evidence Pattern Retrieval, WWW 2024☆15Oct 22, 2024Updated last year
- A curated list of awesome instruction tuning datasets, models, papers and repositories.☆347Jun 12, 2023Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- This repo contains information about FeB4RAG collection☆17Feb 19, 2024Updated 2 years ago
- Code, data, and pretrained models for the paper "Generating Wikipedia Article Sections from Diverse Data Sources"☆20Feb 5, 2021Updated 5 years ago
- Open-WikiTable :Dataset for Open Domain Question Answering with Complex Reasoning over Table☆27Jun 2, 2023Updated 2 years ago
- A Multi-subject High School Examinations Dataset for Cross-lingual and Multilingual Question Answering☆47Apr 5, 2022Updated 4 years ago
- simple QA over knowledge graphs on DBpedia☆25Oct 31, 2018Updated 7 years ago
- Repository for "Building Domain Specific Language Model for NLP Downstream Tasks" of inzva AI Projects #5☆25Jan 18, 2021Updated 5 years ago
- Repository for Teaching Broad Reasoning Skills for Multi-Step QA by Generating Hard Contexts, EMNLP22☆19Jun 23, 2023Updated 2 years ago
- Multilingual abstractive summarization dataset extracted from WikiHow.☆99Mar 14, 2025Updated last year
- A BART version of an open-domain QA model in a closed-book setup☆119Aug 13, 2020Updated 5 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- This repository is a collection of existing KGQA datasets in the form of the 🤗 huggingface datasets -> https://github.com/huggingface/d…☆112Jan 8, 2024Updated 2 years ago
- Materials of public talks given By SJTU X-LANCE members☆14Dec 3, 2022Updated 3 years ago
- Using YouTube to prepare a speech recognition dataset for any language☆10Mar 30, 2021Updated 5 years ago
- A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)☆1,146Jan 4, 2024Updated 2 years ago
- ☆14Mar 17, 2025Updated last year
- ☆11Nov 5, 2021Updated 4 years ago
- An original implementation of EMNLP 2020, "AmbigQA: Answering Ambiguous Open-domain Questions"☆121Apr 23, 2022Updated 3 years ago
- The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning (NeurIPS 2022)☆16Feb 11, 2023Updated 3 years ago
- Alpaca-lora for huggingface implementation using Deepspeed and FullyShardedDataParallel☆24Apr 3, 2023Updated 3 years ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- Fusion-in-Decoder☆592Oct 4, 2023Updated 2 years ago
- 🩺 A collection of ChatGPT evaluation reports on various bechmarks.☆50Mar 28, 2023Updated 3 years ago
- A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.☆2,135Oct 16, 2025Updated 5 months ago
- The source code for running LLMs on the AAAR-1.0 benchmark.☆18Apr 5, 2025Updated last year
- ☆14Feb 9, 2022Updated 4 years ago
- Mapping of the SimpleQuestions dataset to Wikidata☆86Jun 20, 2021Updated 4 years ago
- A simple command line tool to calculate WER for ASR.☆14Oct 14, 2024Updated last year
- ☆30Sep 5, 2021Updated 4 years ago
- MANtIS - a multi-domain information seeking dialogues dataset☆22May 12, 2021Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- steps to perform text-based speaker diarization with kaldi toolkit☆12Nov 2, 2018Updated 7 years ago
- Fine-tuning BART on COVID Dialogue Dataset☆17Apr 8, 2020Updated 6 years ago
- NIILC QA data☆18Nov 20, 2015Updated 10 years ago
- Expanding natural instructions☆1,039Dec 11, 2023Updated 2 years ago
- Binary Passage Retriever (BPR) - an efficient passage retriever for open-domain question answering☆175Jun 6, 2021Updated 4 years ago
- Dense Passage Retriever - is a set of tools and models for open domain Q&A task.☆1,863Apr 6, 2023Updated 3 years ago
- ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels …☆285Aug 19, 2023Updated 2 years ago