soskek / bookcorpus
Crawl BookCorpus
☆812Updated last year
Related projects ⓘ
Alternatives and complementary repositories for bookcorpus
- Plug and Play Language Model implementation. Allows to steer topic and attributes of GPT-2 models.☆1,132Updated 9 months ago
- Open clone of OpenAI's unreleased WebText dataset scraper. This version uses pushshift.io files instead of the API for speed.☆714Updated last year
- Tools to download and cleanup Common Crawl data☆971Updated last year
- BLEURT is a metric for Natural Language Generation based on transfer learning.☆697Updated last year
- jiant is an nlp toolkit☆1,647Updated last year
- [DEPRECATED] Repo for exploring multi-task learning approaches to learning sentence representations☆773Updated 3 years ago
- Fast BPE☆656Updated 5 months ago
- PyTorch original implementation of Cross-lingual Language Model Pretraining.☆2,892Updated last year
- Library for Knowledge Intensive Language Tasks☆916Updated 2 years ago
- ☆1,252Updated last year
- An open clone of the GPT-2 WebText dataset by OpenAI. Still WIP.☆385Updated 7 months ago
- Conditional Transformer Language Model for Controllable Generation☆1,870Updated 3 years ago
- ☆1,508Updated last year
- Language-Agnostic SEntence Representations☆3,600Updated 6 months ago
- This dataset contains 108,463 human-labeled and 656k noisily labeled pairs that feature the importance of modeling structure, context, an…☆555Updated 2 years ago
- Natural Questions (NQ) contains real user questions issued to Google search, and answers found from Wikipedia by annotators. NQ is design…☆940Updated 3 years ago
- This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"☆1,623Updated last year
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators☆2,339Updated 7 months ago
- Codebase for testing whether hidden states of neural networks encode discrete structures.☆383Updated 8 months ago
- An Analysis Toolkit for Natural Language Generation (Translation, Captioning, Summarization, etc.)☆443Updated 3 weeks ago
- The implementation of DeBERTa☆1,991Updated last year
- XTREME is a benchmark for the evaluation of the cross-lingual generalization ability of pre-trained multilingual models that covers 40 ty…☆631Updated last year
- Evaluating Cross-lingual Sentence Representations☆442Updated 3 years ago
- 🐥A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI☆1,511Updated 3 years ago
- Code and model for the paper "Improving Language Understanding by Generative Pre-Training"☆2,160Updated 5 years ago
- Code for "Learning to summarize from human feedback"☆991Updated last year
- Adversarial Natural Language Inference Benchmark☆389Updated 2 years ago
- Repository for the paper "Optimal Subarchitecture Extraction for BERT"☆470Updated 2 years ago
- A python tool for evaluating the quality of sentence embeddings.☆2,087Updated 8 months ago
- Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)☆1,187Updated last month