BlinkDL / minGPT-tuned
A *tuned* minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
☆114Updated 3 years ago
Alternatives and similar repositories for minGPT-tuned:
Users that are interested in minGPT-tuned are comparing it to the libraries listed below
- 基于Gated Attention Unit的Transformer模 型(尝鲜版)☆97Updated 2 years ago
- 基于Transformer的单模型、多尺度的VAE模型☆55Updated 3 years ago
- FairSeq repo with Apollo optimizer☆114Updated last year
- ICLR2023 - Tailoring Language Generation Models under Total Variation Distance☆21Updated 2 years ago
- FLASHQuad_pytorch☆67Updated 3 years ago
- A pytorch &keras implementation and demo of Fastformer.☆188Updated 2 years ago
- Lion and Adam optimization comparison☆61Updated 2 years ago
- ☆24Updated 2 years ago
- An Experiment on Dynamic NTK Scaling RoPE☆64Updated last year
- ☆117Updated 2 years ago
- Transformers at any scale☆41Updated last year
- Code for the paper "A Theoretical Analysis of the Repetition Problem in Text Generation" in AAAI 2021.☆53Updated 2 years ago
- Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719☆22Updated 11 months ago
- Code for RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs. ACL 2023.☆63Updated 5 months ago
- Source code for paper: Knowledge Inheritance for Pre-trained Language Models☆38Updated 3 years ago
- Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch☆45Updated 4 years ago
- Python下shuffle几百G文件☆33Updated 3 years ago
- Implementation of Memory-Compressed Attention, from the paper "Generating Wikipedia By Summarizing Long Sequences"☆70Updated 2 years ago
- Axial Positional Embedding for Pytorch☆79Updated 2 months ago
- Implementation of Memformer, a Memory-augmented Transformer, in Pytorch☆115Updated 4 years ago
- ☆64Updated 8 months ago
- ☆96Updated last year
- This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).☆104Updated 3 years ago
- Standalone Product Key Memory module in Pytorch - for augmenting Transformer models☆78Updated 9 months ago
- Code for ACL 2023 paper titled "Lifting the Curse of Capacity Gap in Distilling Language Models"☆28Updated last year
- A PyTorch implementation of the paper - "Synthesizer: Rethinking Self-Attention in Transformer Models"☆73Updated 2 years ago
- ☆61Updated 2 years ago
- PALM: Pre-training an Autoencoding & Autoregressive Language Model for Context-conditioned Generation☆34Updated 2 years ago
- reStructured Pre-training☆98Updated 2 years ago
- A Tight-fisted Optimizer☆47Updated 2 years ago