Collection of papers and resources for data augmentation for NLP.
β831Aug 12, 2022Updated 3 years ago
Alternatives and similar repositories for DataAug4NLP
Users that are interested in DataAug4NLP are comparing it to the libraries listed below
Sorting:
- Data augmentation for NLPβ4,650Jun 24, 2024Updated last year
- NL-Augmenter π¦ β π A Collaborative Repository of Natural Language Transformationsβ786May 19, 2024Updated last year
- Data augmentation for NLP, presented at EMNLP 2019β1,651Mar 19, 2023Updated 3 years ago
- [EMNLP 2021] Text AutoAugment: Learning Compositional Augmentation Policy for Text Classificationβ130Mar 11, 2023Updated 3 years ago
- TextAttack π is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocsβ¦β3,379Jul 10, 2025Updated 8 months ago
- SentAugment is a data augmentation technique for NLP that retrieves similar sentences from a large bank of sentences. It can be used in cβ¦β359Feb 22, 2022Updated 4 years ago
- [EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821β3,643Oct 16, 2024Updated last year
- Survey of Surveys for Natural Language Processing (SOS4NLP)β327Jul 15, 2021Updated 4 years ago
- Beyond Accuracy: Behavioral Testing of NLP models with CheckListβ2,050Jan 9, 2024Updated 2 years ago
- TextAugment: Text Augmentation Libraryβ433Mar 4, 2026Updated 2 weeks ago
- Active Learning for Text Classification in Pythonβ637Mar 8, 2026Updated last week
- Code associated with the Don't Stop Pretraining ACL 2020 paperβ540Nov 15, 2021Updated 4 years ago
- Code for GenAug: Data Augmentation for Finetuning Text Generators.β27Oct 8, 2021Updated 4 years ago
- Parallelformers: An Efficient Model Parallelization Toolkit for Deploymentβ791Apr 24, 2023Updated 2 years ago
- Must-read papers on prompt-based tuning for pre-trained language models.β4,296Jul 17, 2023Updated 2 years ago
- Must-read Papers on pre-trained language models.β3,362Nov 6, 2022Updated 3 years ago
- BERT-related papersβ2,039Aug 12, 2023Updated 2 years ago
- νκ΅μ΄ λ¬Έμμ λ Έμ΄μ¦λ₯Ό μΆκ°ν©λλ€.β27Nov 9, 2022Updated 3 years ago
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.β32,191Sep 30, 2025Updated 5 months ago
- Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the moβ¦β22,975Jul 28, 2024Updated last year
- Unsupervised Data Augmentation (UDA)β2,202Aug 28, 2021Updated 4 years ago
- Official Code for 'EPiDA: An Easy Plug-in Data Augmentation Framework for High Performance Text Classification' - NAACL 2022β23May 9, 2022Updated 3 years ago
- Augmenty is an augmentation library based on spaCy for augmenting texts.β157May 24, 2024Updated last year
- BertViz: Visualize Attention in Transformer Modelsβ7,954Jan 8, 2026Updated 2 months ago
- β344Aug 3, 2021Updated 4 years ago
- [ACL 2021] Learning Dense Representations of Phrases at Scale; EMNLP'2021: Phrase Retrieval Learns Passage Retrieval, Too https://arxiv.oβ¦β606Jun 15, 2022Updated 3 years ago
- State-of-the-Art Text Embeddingsβ18,390Mar 12, 2026Updated last week
- DGMs for NLP. A roadmap.β396Dec 12, 2022Updated 3 years ago
- MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classificationβ356Jun 5, 2020Updated 5 years ago
- Code for using and evaluating SpanBERT.β906Jul 25, 2023Updated 2 years ago
- π¦ Pretrained BigBird Model for Korean (up to 4096 tokens)β201Dec 28, 2023Updated 2 years ago
- skweak: A software toolkit for weak supervision applied to NLP tasksβ926Sep 2, 2024Updated last year
- This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"β1,626Jun 12, 2023Updated 2 years ago
- Collection of NLP model explanations and accompanying analysis toolsβ144Jun 26, 2023Updated 2 years ago
- An efficient implementation of the popular sequence models for text generation, summarization, and translation tasks. https://arxiv.org/pβ¦β433Aug 17, 2022Updated 3 years ago
- Dense Passage Retriever - is a set of tools and models for open domain Q&A task.β1,863Apr 6, 2023Updated 2 years ago
- ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: givβ¦β461Sep 11, 2024Updated last year
- β42Jan 11, 2021Updated 5 years ago
- Multi-Task Deep Neural Networks for Natural Language Understandingβ2,257Mar 7, 2024Updated 2 years ago