facebookresearch / cc_netLinks
Tools to download and cleanup Common Crawl data
☆1,010Updated 2 years ago
Alternatives and similar repositories for cc_net
Users that are interested in cc_net are comparing it to the libraries listed below
Sorting:
- ☆1,218Updated 10 months ago
- Expanding natural instructions☆995Updated last year
- All-in-one text de-duplication☆679Updated last week
- Dense Passage Retriever - is a set of tools and models for open domain Q&A task.☆1,801Updated 2 years ago
- Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.☆997Updated 10 months ago
- ☆507Updated last year
- Code used for sourcing and cleaning the BigScience ROOTS corpus☆312Updated 2 years ago
- BLEURT is a metric for Natural Language Generation based on transfer learning.☆733Updated last year
- Autoregressive Entity Retrieval☆788Updated last year
- Contriever: Unsupervised Dense Information Retrieval with Contrastive Learning☆737Updated 2 years ago
- A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''.☆933Updated 2 years ago
- ☆1,520Updated last month
- Prefix-Tuning: Optimizing Continuous Prompts for Generation☆930Updated last year
- Code for using and evaluating SpanBERT.☆899Updated last year
- Fast Inference Solutions for BLOOM☆564Updated 7 months ago
- Crosslingual Generalization through Multitask Finetuning☆535Updated 8 months ago