google-research-datasets / c4repsetLinks
C4RepSet: Representative Subset from C4 data for Training Pre-trained LMs
☆10Updated 2 years ago
Alternatives and similar repositories for c4repset
Users that are interested in c4repset are comparing it to the libraries listed below
Sorting:
- ☆29Updated last year
- ☆97Updated 2 years ago
- Pretraining Efficiently on S2ORC!☆164Updated 8 months ago
- AIS is an evaluation framework for assessing whether the output of natural language models only contains information about the external w…☆31Updated 2 years ago
- Repository for the Question Answering via Sentence Composition (QASC) dataset☆56Updated last year
- Embedding Recycling for Language models☆38Updated last year
- 🌾 Universal, customizable and deployable fine-grained evaluation for text generation.☆23Updated last year
- Query-focused summarization data☆42Updated 2 years ago
- SPRINT Toolkit helps you evaluate diverse neural sparse models easily using a single click on any IR dataset.☆45Updated last year
- ☆13Updated 4 years ago
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆48Updated last year
- ☆19Updated 5 years ago
- ☆16Updated last year
- No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrieval☆29Updated 2 years ago
- ☆38Updated last year
- ☆100Updated 2 years ago
- ☆17Updated 2 years ago
- ZS4IE: A Toolkit for Zero-Shot Information Extraction with Simple Verbalizations☆28Updated 3 years ago
- Generate BERT vocabularies and pretraining examples from Wikipedias☆17Updated 5 years ago
- ☆46Updated 3 years ago
- Apps built using Inspired Cognition's Critique.☆58Updated 2 years ago
- Code for "Open Vocabulary Extreme Classification Using Generative Models"☆24Updated 2 years ago
- We believe the ability of an LLM to attribute the text that it generates is likely to be crucial for both system developers and users in …☆54Updated last year
- CCQA A New Web-Scale Question Answering Dataset for Model Pre-Training☆32Updated 2 years ago
- Helper scripts and notes that were used while porting various nlp models☆46Updated 3 years ago
- ☆19Updated 5 years ago
- ⚡️ AllenNLP plugin for adding subcommands to use Optuna, making hyperparameter optimization easy☆33Updated 3 years ago
- ☆22Updated 2 years ago
- Official code and model checkpoints for our EMNLP 2022 paper "RankGen - Improving Text Generation with Large Ranking Models" (https://arx…☆135Updated last year
- SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batchi…☆33Updated last year