A collection of large question answering datasets
☆434Jul 1, 2024Updated last year
Alternatives and similar repositories for large-qa-datasets
Users that are interested in large-qa-datasets are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- WebQuestions QA Benchmarking Dataset☆176May 27, 2016Updated 9 years ago
- ACL 2026 & NAACL 2025: Bridging Retrieval and Inference through Evidence Fusion☆13Apr 9, 2026Updated last month
- ☆11May 1, 2022Updated 4 years ago
- ☆153Aug 21, 2023Updated 2 years ago
- Align, a general text alignment function☆15Dec 7, 2023Updated 2 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- A curated list of awesome instruction tuning datasets, models, papers and repositories.☆347Jun 12, 2023Updated 2 years ago
- This repo contains information about FeB4RAG collection☆17Feb 19, 2024Updated 2 years ago
- Code, data, and pretrained models for the paper "Generating Wikipedia Article Sections from Diverse Data Sources"☆21Feb 5, 2021Updated 5 years ago
- Open-WikiTable :Dataset for Open Domain Question Answering with Complex Reasoning over Table☆28Jun 2, 2023Updated 2 years ago
- MoviE Text Audio QA (MetaQA): a benchmark dataset for question answering☆102Oct 10, 2021Updated 4 years ago
- A Multi-subject High School Examinations Dataset for Cross-lingual and Multilingual Question Answering☆49Apr 5, 2022Updated 4 years ago
- The official implementation of the paper: H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs☆62Jan 14, 2026Updated 3 months ago
- Repository for Teaching Broad Reasoning Skills for Multi-Step QA by Generating Hard Contexts, EMNLP22☆19Jun 23, 2023Updated 2 years ago
- Multilingual abstractive summarization dataset extracted from WikiHow.☆99Mar 14, 2025Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Show the time in Roman Numerals☆11Jan 23, 2020Updated 6 years ago
- A BART version of an open-domain QA model in a closed-book setup☆119Aug 13, 2020Updated 5 years ago
- This repository is a collection of existing KGQA datasets in the form of the 🤗 huggingface datasets -> https://github.com/huggingface/d…☆113Jan 8, 2024Updated 2 years ago
- TSQA: Tabular Scenario Based Question Answering (AAAI 2021)☆18Dec 17, 2020Updated 5 years ago
- Materials of public talks given By SJTU X-LANCE members☆14Dec 3, 2022Updated 3 years ago
- Using YouTube to prepare a speech recognition dataset for any language☆10Mar 30, 2021Updated 5 years ago
- A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)☆1,148Jan 4, 2024Updated 2 years ago
- RDT: Russian Distributional Thesaurus (Русский Дистрибутивный Тезаурус)☆30Feb 28, 2019Updated 7 years ago
- ☆11Nov 5, 2021Updated 4 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- ☆16Mar 17, 2025Updated last year
- An original implementation of EMNLP 2020, "AmbigQA: Answering Ambiguous Open-domain Questions"☆123Apr 23, 2022Updated 4 years ago
- The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning (NeurIPS 2022)☆16Feb 11, 2023Updated 3 years ago
- Alpaca-lora for huggingface implementation using Deepspeed and FullyShardedDataParallel☆24Apr 3, 2023Updated 3 years ago
- Fusion-in-Decoder