rasbt / pytorch-memory-optim
This code repository contains the code used for my "Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch" blog post.
☆86Updated last year
Related projects ⓘ
Alternatives and complementary repositories for pytorch-memory-optim
- Collection of autoregressive model implementation☆67Updated this week
- Just some miscellaneous utility functions / decorators / modules related to Pytorch and Accelerate to help speed up implementation of new…☆119Updated 3 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆92Updated last month
- ☆73Updated 4 months ago
- LoRA and DoRA from Scratch Implementations☆188Updated 8 months ago
- ☆133Updated 9 months ago
- ML/DL Math and Method notes☆57Updated 11 months ago
- Implementation of the Llama architecture with RLHF + Q-learning☆157Updated 10 months ago
- ring-attention experiments☆97Updated last month
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆84Updated last week
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆66Updated 5 months ago
- ☆45Updated 2 months ago
- Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*☆80Updated 11 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆113Updated 7 months ago
- Triton Implementation of HyperAttention Algorithm☆46Updated 11 months ago
- Set of scripts to finetune LLMs☆36Updated 7 months ago
- ☆49Updated 8 months ago
- ☆76Updated 7 months ago
- PB-LLM: Partially Binarized Large Language Models☆148Updated last year
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (Official Code)☆135Updated last month
- A set of scripts and notebooks on LLM finetunning and dataset creation☆93Updated last month
- ☆127Updated last year
- Prune transformer layers☆64Updated 5 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆104Updated last month
- σ-GPT: A New Approach to Autoregressive Models☆59Updated 3 months ago
- ☆82Updated 8 months ago
- Implementation of Infini-Transformer in Pytorch☆104Updated last month
- ☆63Updated 4 months ago
- Utilities for Training Very Large Models☆56Updated last month
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆38Updated 10 months ago