subramen / minGPT-ddpLinks

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

☆22

Alternatives and similar repositories for minGPT-ddp

Users that are interested in minGPT-ddp are comparing it to the libraries listed below

Sorting:

OscarXZQ / weight-selection
☆186Updated last year
NVIDIA / Megatron-Energon
Megatron's multi-modal data loader
☆278Updated last week
kyegomez / Mixture-of-Depths
Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆110Updated last week
lucidrains / soft-moe-pytorch
Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch
☆336Updated 7 months ago
Qualcomm-AI-research / lr-qat
☆48Updated last year
fangyuan-ksgk / Mini-LLaVA
A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.
☆96Updated 11 months ago
ambisinister / mla-experiments
Experiments on Multi-Head Latent Attention
☆99Updated last year
Dao-AILab / grouped-latent-attention
☆132Updated 6 months ago
NVlabs / COAT
[ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training
☆250Updated 3 months ago
indri-voice / vit.triton
VIT inference in triton because, why not?
☆32Updated last year
haochengxi / Train_Transformers_with_INT4
☆157Updated 2 years ago
MingSun-Tse / Awesome-Efficient-ViT
Recent Advances on Efficient Vision Transformers
☆55Updated 2 years ago
nil0x9 / flash-muon
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆206Updated 5 months ago
SNU-ARC / any-precision-llm
[ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
☆121Updated 4 months ago
aniquetahir / JORA
JORA: JAX Tensor-Parallel LoRA Library (ACL 2024)
☆36Updated last year
apple / ml-sigmoid-attention
☆303Updated 7 months ago
Lightning-AI / forked-pdb
Python pdb for multiple processes
☆62Updated 6 months ago
mit-han-lab / sparsevit
[CVPR'23] SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer
☆76Updated last year
TianjinYellow / StableSPAM
☆27Updated 8 months ago
vedantroy / gpu_kernels
☆27Updated last year
yuzhenmao / IceFormer
Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).
☆25Updated 4 months ago
bobby-he / simplified_transformers
☆293Updated 11 months ago
kyegomez / FlashAttention20
Get down and dirty with FlashAttention2.0 in pytorch, plug in and play no complex CUDA kernels
☆112Updated 2 years ago
feifeibear / Odysseus-Transformer
Odysseus: Playground of LLM Sequence Parallelism
☆78Updated last year
thu-nics / MoA
[CoLM'25] The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>
☆150Updated 4 months ago
HazyResearch / fly
☆221Updated 2 years ago
gpu-mode / ring-attention
ring-attention experiments
☆160Updated last year
ppwwyyxx / RAM-multiprocess-dataloader
Demystify RAM Usage in Multi-Process Data Loaders
☆204Updated 2 years ago
cli99 / flops-profiler
pytorch-profiler
☆51Updated 2 years ago
mit-han-lab / patch_conv
Patch convolution to avoid large GPU memory usage of Conv2D
☆93Updated 10 months ago