rish-16 / gpt3-pytorchLinks
Unofficial PyTorch Implementation of OpenAI's GPT-3
☆13Updated 3 years ago
Alternatives and similar repositories for gpt3-pytorch
Users that are interested in gpt3-pytorch are comparing it to the libraries listed below
Sorting:
- A *tuned* minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training☆120Updated 4 years ago
- 逻辑回归和单层softmax的解析解☆12Updated 4 years ago
- Codes of the paper Deformable Butterfly: A Highly Structured and Sparse Linear Transform.☆16Updated 4 years ago
- Large Scale Distributed Model Training strategy with Colossal AI and Lightning AI☆56Updated 2 years ago
- (ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"☆17Updated 2 years ago
- Albert for Conversational Question Answering Challenge☆22Updated 2 years ago
- Code for the paper "Query-Key Normalization for Transformers"☆51Updated 4 years ago
- Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing☆49Updated 4 years ago
- Implementation of Memory-Compressed Attention, from the paper "Generating Wikipedia By Summarizing Long Sequences"☆70Updated 2 years ago
- A Translation Task using TurboTransformers☆11Updated 5 years ago
- ☆24Updated 3 years ago
- A variant of Transformer-XL where the memory is updated not with a queue, but with attention☆49Updated 5 years ago
- Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch☆76Updated 3 years ago
- Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012☆49Updated 3 years ago
- ☆27Updated 6 months ago
- Finetune CPM-1☆24Updated 4 years ago
- GPT2 finetuning with transformers 🤗☆28Updated 5 years ago
- Implementation of Multistream Transformers in Pytorch☆54Updated 4 years ago
- My explorations into editing the knowledge and memories of an attention network☆35Updated 3 years ago
- Inference framework for MoE layers based on TensorRT with Python binding☆41Updated 4 years ago
- Local Attention - Flax module for Jax☆22Updated 4 years ago
- Emotion-Aware Dialogue Response Generation by Multi-Task Learning☆13Updated 4 years ago
- Virtual Adversarial Training (VAT) techniques in PyTorch☆17Updated 3 years ago
- Transformers at any scale☆42Updated 2 years ago
- Code associated with the paper **SkipBERT: Efficient Inference with Shallow Layer Skipping**, at ACL 2022☆16Updated 3 years ago
- Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch☆46Updated 4 years ago
- An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.☆21Updated 3 years ago
- An implementation of an autoregressive language model using an improved Transformer and DeepSpeed pipeline parallelism.☆30Updated last month
- ☆52Updated 3 years ago
- Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Tra…☆33Updated 4 years ago