Collection of papers and resources for data augmentation for NLP.
β831Aug 12, 2022Updated 3 years ago
Alternatives and similar repositories for DataAug4NLP
Users that are interested in DataAug4NLP are comparing it to the libraries listed below
Sorting:
- Data augmentation for NLPβ4,644Jun 24, 2024Updated last year
- NL-Augmenter π¦ β π A Collaborative Repository of Natural Language Transformationsβ786May 19, 2024Updated last year
- Data augmentation for NLP, presented at EMNLP 2019β1,650Mar 19, 2023Updated 2 years ago
- TextAttack π is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocsβ¦β3,364Jul 10, 2025Updated 7 months ago
- Survey of Surveys for Natural Language Processing (SOS4NLP)β327Jul 15, 2021Updated 4 years ago
- SentAugment is a data augmentation technique for NLP that retrieves similar sentences from a large bank of sentences. It can be used in cβ¦β359Feb 22, 2022Updated 4 years ago
- [EMNLP 2021] Text AutoAugment: Learning Compositional Augmentation Policy for Text Classificationβ130Mar 11, 2023Updated 2 years ago
- Active Learning for Text Classification in Pythonβ639Feb 1, 2026Updated 3 weeks ago
- [EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821β3,641Oct 16, 2024Updated last year
- Beyond Accuracy: Behavioral Testing of NLP models with CheckListβ2,048Jan 9, 2024Updated 2 years ago
- Code associated with the Don't Stop Pretraining ACL 2020 paperβ539Nov 15, 2021Updated 4 years ago
- Parallelformers: An Efficient Model Parallelization Toolkit for Deploymentβ791Apr 24, 2023Updated 2 years ago
- TextAugment: Text Augmentation Libraryβ432Dec 10, 2025Updated 2 months ago
- Must-read Papers on pre-trained language models.β3,365Nov 6, 2022Updated 3 years ago
- BERT-related papersβ2,040Aug 12, 2023Updated 2 years ago
- β344Aug 3, 2021Updated 4 years ago
- [ACL 2021] Learning Dense Representations of Phrases at Scale; EMNLP'2021: Phrase Retrieval Learns Passage Retrieval, Too https://arxiv.oβ¦β606Jun 15, 2022Updated 3 years ago
- Must-read papers on prompt-based tuning for pre-trained language models.β4,295Jul 17, 2023Updated 2 years ago
- An efficient implementation of the popular sequence models for text generation, summarization, and translation tasks. https://arxiv.org/pβ¦β433Aug 17, 2022Updated 3 years ago
- BertViz: Visualize Attention in Transformer Modelsβ7,921Jan 8, 2026Updated last month
- Graph4nlp is the library for the easy use of Graph Neural Networks for NLP. Welcome to visit our DLG4NLP website (https://dlg4nlp.github.β¦β1,685Jun 24, 2024Updated last year
- State-of-the-Art Text Embeddingsβ18,298Feb 20, 2026Updated last week
- The code and models for "An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks" (AACL-IJCNLP 2020)β119Oct 8, 2020Updated 5 years ago
- skweak: A software toolkit for weak supervision applied to NLP tasksβ926Sep 2, 2024Updated last year
- A Unified Library for Parameter-Efficient and Modular Transfer Learningβ2,801Oct 12, 2025Updated 4 months ago
- Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the moβ¦β22,981Jul 28, 2024Updated last year
- Multi-Task Deep Neural Networks for Natural Language Understandingβ2,258Mar 7, 2024Updated last year
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.β32,159Sep 30, 2025Updated 4 months ago
- This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"β1,628Jun 12, 2023Updated 2 years ago
- This repository contains the code for "Generating Datasets with Pretrained Language Models".β189Aug 17, 2021Updated 4 years ago
- ACL2020 Tutorial: Open-Domain Question Answeringβ835Jan 1, 2021Updated 5 years ago
- Explain, analyze, and visualize NLP language models. Ecco creates interactive visualizations directly in Jupyter notebooks explaining theβ¦β2,081Aug 15, 2024Updated last year
- DGMs for NLP. A roadmap.β395Dec 12, 2022Updated 3 years ago
- Dense Passage Retriever - is a set of tools and models for open domain Q&A task.β1,860Apr 6, 2023Updated 2 years ago
- Unsupervised Data Augmentation (UDA)β2,204Aug 28, 2021Updated 4 years ago
- This repo is for Korean wiki table question answering datasets described in the paper of Korean-Specific Dataset for Table Question Answeβ¦β91Oct 22, 2024Updated last year
- BERT score for text generationβ1,873Jul 30, 2024Updated last year
- ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: givβ¦β461Sep 11, 2024Updated last year
- Code for using and evaluating SpanBERT.β904Jul 25, 2023Updated 2 years ago