BlinkDL / minGPT-tunedLinks

A *tuned* minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

☆115

Alternatives and similar repositories for minGPT-tuned

Users that are interested in minGPT-tuned are comparing it to the libraries listed below

Sorting:

bojone / univae
基于Transformer的单模型、多尺度的VAE模型
☆57Updated 4 years ago
thunlp / Knowledge-Inheritance
Source code for paper: Knowledge Inheritance for Pre-trained Language Models
☆38Updated 3 years ago
lucidrains / coco-lm-pytorch
Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch
☆46Updated 4 years ago
lucidrains / memory-compressed-attention
Implementation of Memory-Compressed Attention, from the paper "Generating Wikipedia By Summarizing Long Sequences"
☆71Updated 2 years ago
Lightning-Universe / lightning-ColossalAI
Large Scale Distributed Model Training strategy with Colossal AI and Lightning AI
☆57Updated last year
fuzihaofzh / repetition-problem-nlg
Code for the paper "A Theoretical Analysis of the Repetition Problem in Text Generation" in AAAI 2021.
☆54Updated 2 years ago
fastnlp / ElasticBERT
A pre-trained model with multi-exit transformer architecture.
☆54Updated 2 years ago
Jxu-Thu / DITTO
The code of paper "Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation" published at NeurIPS 202…
☆46Updated 2 years ago
GeeeekExplorer / cupytorch
A small framework mimics PyTorch using CuPy or NumPy
☆41Updated 3 years ago
lxk00 / BERT-EMD
☆50Updated 2 years ago
10-zin / Synthesizer
A PyTorch implementation of the paper - "Synthesizer: Rethinking Self-Attention in Transformer Models"
☆73Updated 2 years ago
sunyt32 / torchscale
Transformers at any scale
☆41Updated last year
bigscience-workshop / architecture-objective
☆97Updated last year
cimeister / typical-sampling
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
☆82Updated 3 years ago
CyndxAI / QKNorm
Code for the paper "Query-Key Normalization for Transformers"
☆43Updated 4 years ago
bojone / shuffle
Python下shuffle几百G文件
☆33Updated 3 years ago
NormXU / Consistent-DynamicNTKRoPE
An Experiment on Dynamic NTK Scaling RoPE
☆64Updated last year
lucidrains / memory-transformer-xl
A variant of Transformer-XL where the memory is updated not with a queue, but with attention
☆49Updated 4 years ago
leaderj1001 / Synthesizer-Rethinking-Self-Attention-Transformer-Models
Implementing SYNTHESIZER: Rethinking Self-Attention in Transformer Models using Pytorch
☆70Updated 5 years ago
wuch15 / Fastformer
A pytorch &keras implementation and demo of Fastformer.
☆189Updated 2 years ago
cloneofsimo / realformer-pytorch
Implementation of RealFormer using pytorch
☆100Updated 4 years ago
ChenghaoMou / pytorch-pQRNN
Implementation of pQRNN in PyTorch
☆46Updated 3 years ago
XuezheMax / fairseq-apollo
FairSeq repo with Apollo optimizer
☆114Updated last year
overwindows / PALM
PALM: Pre-training an Autoencoding & Autoregressive Language Model for Context-conditioned Generation
☆34Updated 2 years ago
rsvp-ai / segatron_aaai
codes and pre-trained models of paper "Segatron: Segment-aware Transformer for Language Modeling and Understanding"
☆18Updated 2 years ago
THUDM / Multilingual-GLM
The multilingual variant of GLM, a general language model trained with autoregressive blank infilling objective
☆62Updated 2 years ago
BlinkDL / RWKV-v2-RNN-Pile
RWKV-v2-RNN trained on the Pile. See https://github.com/BlinkDL/RWKV-LM for details.
☆67Updated 2 years ago
BAAI-WuDao / P-tuning
Finetune CPM-1
☆24Updated 4 years ago
ZhuiyiTechnology / GAU-alpha
基于Gated Attention Unit的Transformer模型（尝鲜版）
☆98Updated 2 years ago
haorannlp / mix
Code for "Mixed Cross Entropy Loss for Neural Machine Translation"
☆20Updated 3 years ago