facebookresearch / cc_net
Tools to download and cleanup Common Crawl data
☆983Updated last year
Alternatives and similar repositories for cc_net:
Users that are interested in cc_net are comparing it to the libraries listed below
- BLEURT is a metric for Natural Language Generation based on transfer learning.☆715Updated last year
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆1,364Updated 11 months ago
- ☆1,176Updated 6 months ago
- Expanding natural instructions☆975Updated last year
- All-in-one text de-duplication☆655Updated 9 months ago
- Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.☆984Updated 6 months ago
- A research project for natural language generation, containing the official implementations by MSRA NLC team.☆705Updated 6 months ago
- Fast BPE☆662Updated 8 months ago
- Code used for sourcing and cleaning the BigScience ROOTS corpus☆307Updated last year
- Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons☆1,101Updated last month
- BERT score for text generation☆1,676Updated 6 months ago
- [ACL 2021] LM-BFF: Better Few-shot Fine-tuning of Language Models https://arxiv.org/abs/2012.15723☆725Updated 2 years ago
- Crosslingual Generalization through Multitask Finetuning☆525Updated 5 months ago
- Autoregressive Entity Retrieval☆781Updated last year
- ☆1,497Updated last week
- A full Python Implementation of the ROUGE Metric (not a wrapper)☆685Updated 3 months ago
- FastFormers - highly efficient transformer models for NLU☆704Updated last year
- Fast Inference Solutions for BLOOM☆563Updated 4 months ago
- ☆494Updated last year
- Code for using and evaluating SpanBERT.☆895Updated last year
- ☆1,263Updated 2 years ago
- [DEPRECATED] Repo for exploring multi-task learning approaches to learning sentence representations☆783Updated 3 years ago
- XTREME is a benchmark for the evaluation of the cross-lingual generalization ability of pre-trained multilingual models that covers 40 ty…☆638Updated 2 years ago
- An efficient implementation of the popular sequence models for text generation, summarization, and translation tasks. https://arxiv.org/p…☆430Updated 2 years ago
- [NeurIPS'22 Spotlight] A Contrastive Framework for Neural Text Generation☆469Updated 11 months ago
- Adversarial Natural Language Inference Benchmark☆396Updated 2 years ago
- ⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.☆572Updated last year
- The implementation of DeBERTa☆2,036Updated last year
- [ACL 2021] Learning Dense Representations of Phrases at Scale; EMNLP'2021: Phrase Retrieval Learns Passage Retrieval, Too https://arxiv.o…☆603Updated 2 years ago
- Dense Passage Retriever - is a set of tools and models for open domain Q&A task.☆1,753Updated last year