TsinghuaAI / TDSLinks

A plug-in of Microsoft DeepSpeed to fix the bug of DeepSpeed pipeline

☆25

Alternatives and similar repositories for TDS

Users that are interested in TDS are comparing it to the libraries listed below

Sorting:

TsinghuaAI / CPM-1-Pretrain
Pretrain CPM-1
☆52Updated 4 years ago
hpcaitech / PaLM-colossalai
Scalable PaLM implementation of PyTorch
☆189Updated 2 years ago
princeton-nlp / DinkyTrain
Princeton NLP's pre-training library based on fairseq with DeepSpeed kernel integration 🚃
☆114Updated 3 years ago
TsinghuaAI / CPM
Introduction to CPM
☆166Updated 4 years ago
TobiasLee / Awesome-Efficient-PLM
Must-read papers on improving efficiency for pre-trained language models.
☆105Updated 3 years ago
dimil6666 / shannon.ai-breaking-news
香侬科技（北京香侬慧语科技有限责任公司）知乎爆料备份
☆43Updated 5 years ago
thunlp / TR-BERT
Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"
☆48Updated 3 years ago
microsoft / BANG
BANG is a new pretraining model to Bridge the gap between Autoregressive (AR) and Non-autoregressive (NAR) Generation. AR and NAR generat…
☆28Updated 3 years ago
fuzihaofzh / repetition-problem-nlg
Code for the paper "A Theoretical Analysis of the Repetition Problem in Text Generation" in AAAI 2021.
☆57Updated 3 years ago
ExpressAI / reStructured-Pretraining
reStructured Pre-training
☆98Updated 2 years ago
TsinghuaAI / CUGE
☆54Updated 3 years ago
LorrinWWW / SkipBERT
Code associated with the paper **SkipBERT: Efficient Inference with Shallow Layer Skipping**, at ACL 2022
☆16Updated 3 years ago
ProjectD-AI / LLaMA-Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
☆69Updated 2 years ago
bytedance / ParaGen
ParaGen is a PyTorch deep learning framework for parallel sequence generation.
☆185Updated 3 years ago
castorini / DeeBERT
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference
☆160Updated 3 years ago
princeton-nlp / CoFiPruning
[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408
☆198Updated 2 years ago
IBM / PoWER-BERT
Method to improve inference time for BERT. This is an implementation of the paper titled "PoWER-BERT: Accelerating BERT Inference via Pro…
☆62Updated 2 months ago
THUDM / iPrompt
Code, Data and Demo for Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting
☆124Updated 2 years ago
kssteven418 / LTP
[KDD'22] Learned Token Pruning for Transformers
☆101Updated 2 years ago
txsun1997 / nlp-paradigm-shift
Paradigm shift in natural language processing
☆42Updated 3 years ago
QipengGuo / NLP-Notes
Notes of my introduction about NLP in Fudan University
☆37Updated 4 years ago
RUCKBReasoning / GLM-Dialog
☆59Updated 2 years ago
fastnlp / ElasticBERT
A pre-trained model with multi-exit transformer architecture.
☆56Updated 2 years ago
THUDM / icetk
A unified tokenization tool for Images, Chinese and English.
☆153Updated 2 years ago
microsoft / SEED-Encoder
☆45Updated 4 years ago
keezen / ntk_alibi
NTK scaled version of ALiBi position encoding in Transformer.
☆69Updated 2 years ago
princeton-nlp / TRIME
[EMNLP 2022] Training Language Models with Memory Augmentation https://arxiv.org/abs/2205.12674
☆196Updated 2 years ago
TsinghuaAI / CPM-2-Pretrain
Code for CPM-2 Pre-Train
☆158Updated 2 years ago
TsinghuaAI / CPM-1-Finetune
Finetune CPM-1
☆75Updated 2 years ago
ShannonAI / GNN-LM
☆46Updated 3 years ago