leogao2 / commoncrawl_downloader
☆33Updated last year
Alternatives and similar repositories for commoncrawl_downloader:
Users that are interested in commoncrawl_downloader are comparing it to the libraries listed below
- ☆77Updated last year
- ☆89Updated 2 years ago
- Code for Relevance-guided Supervision for OpenQA with ColBERT (TACL'21)☆41Updated 3 years ago
- Open source library for few shot NLP☆77Updated last year
- ☆97Updated 2 years ago
- Tools for managing datasets for governance and training.☆83Updated last month
- Implementation of Marge, Pre-training via Paraphrasing, in Pytorch☆75Updated 4 years ago
- Training T5 to perform numerical reasoning.☆23Updated 3 years ago
- Summary Explorer is a tool to visually explore the state-of-the-art in text summarization.☆44Updated 10 months ago
- Downloads 2020 English Wikipedia articles as plaintext☆23Updated 2 years ago
- ☆110Updated 2 years ago
- The corresponding code for our paper: "Exploring the Challenges of Open Domain Multi-Document Summarization". Do not hesitate to open an …☆32Updated last year
- An original implementation of EMNLP 2020, "AmbigQA: Answering Ambiguous Open-domain Questions"☆118Updated 2 years ago
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆93Updated 2 years ago
- Code repo for "Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers" (ACL 2023)☆22Updated last year
- Source codes for the paper "Bounding the Capabilities of Large Language Models in Open Text Generation with Prompt Constraints"☆28Updated 2 years ago
- ☆128Updated 2 months ago
- A library for finding knowledge neurons in pretrained transformer models.☆155Updated 3 years ago
- ☆147Updated 4 years ago
- A library for squeakily cleaning and filtering language datasets.☆46Updated last year
- Documentation effort for the BookCorpus dataset☆33Updated 3 years ago
- No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrieval☆29Updated 2 years ago
- ☆44Updated 4 months ago
- Official implementation of the paper "IteraTeR: Understanding Iterative Revision from Human-Written Text" (ACL 2022)☆78Updated last year
- Our open source implementation of MiniLMv2 (https://aclanthology.org/2021.findings-acl.188)☆61Updated last year
- For experiments involving instruct gpt. Currently used for documenting open research questions.☆71Updated 2 years ago
- Prompt tuning toolkit for GPT-2 and GPT-Neo☆88Updated 3 years ago
- Script for downloading GitHub.☆91Updated 8 months ago
- An easy to use framework for large-scale fact-checking and question answering☆69Updated last year
- A package for fine-tuning Transformers with TPUs, written in Tensorflow2.0+☆37Updated 4 years ago