rasbt / pytorch-memory-optimLinks

This code repository contains the code used for my "Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch" blog post.

☆92

Alternatives and similar repositories for pytorch-memory-optim

Users that are interested in pytorch-memory-optim are comparing it to the libraries listed below

Sorting:

stas00 / ml-ways
ML/DL Math and Method notes
☆64Updated 2 years ago
drisspg / transformer_nuggets
A place to store reusable transformer components of my own creation or found on the interwebs
☆62Updated 2 weeks ago
joey00072 / ohara
Collection of autoregressive model implementation
☆85Updated 7 months ago
cloneofsimo / min-fsdp
☆91Updated last year
lucidrains / llama-qrlhf
Implementation of the Llama architecture with RLHF + Q-learning
☆168Updated 10 months ago
softmax1 / Flash-Attention-Softmax-N
CUDA and Triton implementations of Flash Attention with SoftmaxN.
☆73Updated last year
eth-easl / fmengine
Utilities for Training Very Large Models
☆58Updated last year
lessw2020 / transformer_central
Various transformers for FSDP research
☆38Updated 3 years ago
rasbt / dora-from-scratch
LoRA and DoRA from Scratch Implementations
☆215Updated last year
gpu-mode / profiling-cuda-in-torch
☆177Updated last year
lucidrains / grokfast-pytorch
Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
☆103Updated 11 months ago
hamelsmu / llama-inference
experiments with inference on llama
☆103Updated last year
SeunghyunSEO / optimized_hf_llama_class_for_training
☆48Updated last year
xrsrke / pipegoose
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
☆87Updated last year
kshitij12345 / torchnnprofiler
Context Manager to profile the forward and backward times of PyTorch's nn.Module
☆83Updated 2 years ago
geronimi73 / phi2-finetune
☆86Updated last year
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
huggingface / chug
Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.
☆160Updated last year
lucidrains / pytorch-custom-utils
Just some miscellaneous utility functions / decorators / modules related to Pytorch and Accelerate to help speed up implementation of new…
☆125Updated last year
graphcore-research / out-of-the-box-fp8-training
Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.
☆46Updated last year
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
gpu-mode / triton-tutorials
☆15Updated 6 months ago
cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆132Updated last year
facebookresearch / llm-speedrunner
The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…
☆112Updated last month
rasbt / cvpr2023
☆134Updated 2 years ago
llm-efficiency-challenge / neurips_llm_efficiency_challenge
NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day
☆258Updated 2 years ago
joey00072 / Tinytorch
A really tiny autograd engine
☆96Updated 6 months ago
muellerzr / minimal-trainer-zoo
Minimal example scripts of the Hugging Face Trainer, focused on staying under 150 lines
☆196Updated last year
vdesai2014 / inference-optimization-blog-post
☆90Updated last year
epfml / DenseFormer
☆82Updated last year