styfeng/DataAug4NLP

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/styfeng/DataAug4NLP)

styfeng / DataAug4NLP

Collection of papers and resources for data augmentation for NLP.

☆834

Alternatives and similar repositories for DataAug4NLP

Users that are interested in DataAug4NLP are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

makcedward / nlpaug
View on GitHub
Data augmentation for NLP
☆4,662Updated this week
GEM-benchmark / NL-Augmenter
View on GitHub
NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations
☆786May 19, 2024Updated 2 years ago
jasonwei20 / eda_nlp
View on GitHub
Data augmentation for NLP, presented at EMNLP 2019
☆1,651Mar 19, 2023Updated 3 years ago
lancopku / text-autoaugment
View on GitHub
[EMNLP 2021] Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification
☆129Mar 11, 2023Updated 3 years ago
QData / TextAttack
View on GitHub
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs…
☆3,449Apr 17, 2026Updated 3 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
facebookresearch / SentAugment
View on GitHub
SentAugment is a data augmentation technique for NLP that retrieves similar sentences from a large bank of sentences. It can be used in c…
☆359Feb 22, 2022Updated 4 years ago
princeton-nlp / SimCSE
View on GitHub
[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
☆3,655Oct 16, 2024Updated last year
thunlp / SOS4NLP
View on GitHub
Survey of Surveys for Natural Language Processing (SOS4NLP)
☆327Jul 15, 2021Updated 5 years ago
marcotcr / checklist
View on GitHub
Beyond Accuracy: Behavioral Testing of NLP models with CheckList
☆2,051Jan 9, 2024Updated 2 years ago
dsfsi / textaugment
View on GitHub
TextAugment: Text Augmentation Library
☆442Mar 4, 2026Updated 4 months ago
webis-de / small-text
View on GitHub
Active Learning for Text Classification in Python
☆646May 24, 2026Updated last month
allenai / dont-stop-pretraining
View on GitHub
Code associated with the Don't Stop Pretraining ACL 2020 paper
☆543Nov 15, 2021Updated 4 years ago
tunib-ai / parallelformers
View on GitHub
Parallelformers: An Efficient Model Parallelization Toolkit for Deployment
☆787Apr 24, 2023Updated 3 years ago
styfeng / GenAug
View on GitHub
Code for GenAug: Data Augmentation for Finetuning Text Generators.
☆28Oct 8, 2021Updated 4 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
thunlp / PromptPapers
View on GitHub
Must-read papers on prompt-based tuning for pre-trained language models.
☆4,319Jul 17, 2023Updated 3 years ago
thunlp / PLMpapers
View on GitHub
Must-read Papers on pre-trained language models.
☆3,361Nov 6, 2022Updated 3 years ago
tomohideshibata / BERT-related-papers
View on GitHub
BERT-related papers
☆2,034Aug 12, 2023Updated 2 years ago
hkjeon13 / noising-korean
View on GitHub
한국어 문서에 노이즈를 추가합니다.
☆27Nov 9, 2022Updated 3 years ago
facebookresearch / fairseq
View on GitHub
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
☆32,244Sep 30, 2025Updated 9 months ago
sebastianruder / NLP-progress
View on GitHub
Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the mo…
☆22,957Jul 28, 2024Updated last year
google-research / uda
View on GitHub
Unsupervised Data Augmentation (UDA)
☆2,206Aug 28, 2021Updated 4 years ago
zhaominyiz / EPiDA
View on GitHub
Official Code for 'EPiDA: An Easy Plug-in Data Augmentation Framework for High Performance Text Classification' - NAACL 2022
☆23May 9, 2022Updated 4 years ago
jessevig / bertviz
View on GitHub
BertViz: Visualize Attention in Transformer Models
☆8,124Jan 8, 2026Updated 6 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
allenai / naacl2021-longdoc-tutorial
View on GitHub
☆343Aug 3, 2021Updated 4 years ago
KennethEnevoldsen / augmenty
View on GitHub
Augmenty is an augmentation library based on spaCy for augmenting texts.
☆156May 24, 2024Updated 2 years ago
huggingface / sentence-transformers
View on GitHub
State-of-the-Art Embeddings, Retrieval, and Reranking
☆18,920Updated this week
princeton-nlp / DensePhrases
View on GitHub
[ACL 2021] Learning Dense Representations of Phrases at Scale; EMNLP'2021: Phrase Retrieval Learns Passage Retrieval, Too https://arxiv.o…
☆607Jun 15, 2022Updated 4 years ago
FranxYao / Deep-Generative-Models-for-Natural-Language-Processing
View on GitHub
DGMs for NLP. A roadmap.
☆393Dec 12, 2022Updated 3 years ago
monologg / KoBigBird
View on GitHub
🦅 Pretrained BigBird Model for Korean (up to 4096 tokens)
☆202Dec 28, 2023Updated 2 years ago
facebookresearch / SpanBERT
View on GitHub
Code for using and evaluating SpanBERT.
☆908Jul 25, 2023Updated 2 years ago
SALT-NLP / MixText
View on GitHub
MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification
☆355Jun 5, 2020Updated 6 years ago
violet-zct / fairseq-dro-mnmt
View on GitHub
☆14Sep 10, 2021Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
timoschick / pet
View on GitHub
This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"
☆1,625Jun 12, 2023Updated 3 years ago
NorskRegnesentral / skweak
View on GitHub
skweak: A software toolkit for weak supervision applied to NLP tasks
☆925Sep 2, 2024Updated last year
facebookresearch / DPR
View on GitHub
Dense Passage Retriever - is a set of tools and models for open domain Q&A task.
☆1,867Apr 6, 2023Updated 3 years ago
microsoft / fastseq
View on GitHub
An efficient implementation of the popular sequence models for text generation, summarization, and translation tasks. https://arxiv.org/p…
☆433Aug 17, 2022Updated 3 years ago
DFKI-NLP / thermostat
View on GitHub
Collection of NLP model explanations and accompanying analysis tools
☆141Jun 26, 2023Updated 3 years ago
google-research-datasets / ToTTo
View on GitHub
ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: giv…
☆465Sep 11, 2024Updated last year
graph4ai / graph4nlp
View on GitHub
Graph4nlp is the library for the easy use of Graph Neural Networks for NLP. Welcome to visit our DLG4NLP website (https://dlg4nlp.github.…
☆1,689Jun 24, 2024Updated 2 years ago