subramen / minGPT-ddpLinks
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
☆20Updated 2 years ago
Alternatives and similar repositories for minGPT-ddp
Users that are interested in minGPT-ddp are comparing it to the libraries listed below
Sorting:
- ☆31Updated 11 months ago
- Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"☆101Updated 8 months ago
- ☆181Updated 8 months ago
- A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.☆93Updated 5 months ago
- ☆144Updated this week
- Best practices for testing advanced Mixtral, DeepSeek, and Qwen series MoE models using Megatron Core MoE.☆17Updated this week
- Patch convolution to avoid large GPU memory usage of Conv2D☆87Updated 4 months ago
- [NeurIPS 2022 Spotlight] This is the official PyTorch implementation of "EcoFormer: Energy-Saving Attention with Linear Complexity"☆72Updated 2 years ago
- ImageNet-12k subset of ImageNet-21k (fall11)☆21Updated last year
- Timm model explorer☆39Updated last year
- Implementation of Infini-Transformer in Pytorch☆111Updated 5 months ago
- Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"☆123Updated last year
- ☆39Updated 7 months ago
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆39Updated last year
- ☆68Updated 10 months ago
- Transformers w/o Attention, based fully on MLPs☆93Updated last year
- ☆43Updated 7 months ago
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆27Updated last year
- ☆50Updated last year
- ☆93Updated last week
- PyTorch implementation of moe, which stands for mixture of experts☆43Updated 4 years ago
- Official implementation of "The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs"☆32Updated last month
- This is a PyTorch implementation of the paperViP A Differentially Private Foundation Model for Computer Vision☆36Updated last year
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆121Updated last week
- [ICLR 2022] "As-ViT: Auto-scaling Vision Transformers without Training" by Wuyang Chen, Wei Huang, Xianzhi Du, Xiaodan Song, Zhangyang Wa…☆76Updated 3 years ago
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆94Updated this week
- Efficient Mixture of Experts for LLM Paper List☆68Updated 5 months ago
- JORA: JAX Tensor-Parallel LoRA Library (ACL 2024)☆33Updated last year
- ☆26Updated last year
- FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation☆48Updated 10 months ago