gnovack / distributed-training-and-deepspeed
☆13Updated last year
Related projects: ⓘ
- Various transformers for FSDP research☆31Updated last year
- A place to store reusable transformer components of my own creation or found on the interwebs☆43Updated 3 weeks ago
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…☆73Updated last month
- Learn CUDA with PyTorch☆11Updated last month
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆91Updated last year
- ☆34Updated this week
- ☆66Updated 3 months ago
- A fast implementation of T5/UL2 in PyTorch using Flash Attention☆60Updated this week
- ML/DL Math and Method notes☆56Updated 9 months ago
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆66Updated 3 months ago
- This code repository contains the code used for my "Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch" blog po…☆84Updated last year
- experiments with inference on llama☆106Updated 3 months ago
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆150Updated last week
- Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and te…☆42Updated 8 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆88Updated 11 months ago
- Code for NeurIPS LLM Efficiency Challenge☆52Updated 5 months ago
- A set of scripts and notebooks on LLM finetunning and dataset creation☆89Updated last week
- ☆42Updated 3 weeks ago
- Fast, Modern, Memory Efficient, and Low Precision PyTorch Optimizers☆53Updated 2 months ago
- Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*☆77Updated 9 months ago
- Utilities for Training Very Large Models☆56Updated last week
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆36Updated 8 months ago
- ☆38Updated 5 months ago
- Simple and efficient pytorch-native transformer training and inference (batched)☆53Updated 5 months ago
- ☆61Updated 3 weeks ago
- ☆68Updated 2 months ago
- Pragmatic approach to parsing import profiles for CI's☆11Updated 2 months ago
- Official repo to On the Generalization Ability of Retrieval-Enhanced Transformers☆35Updated 3 months ago
- Triton Implementation of HyperAttention Algorithm☆46Updated 9 months ago
- Context Manager to profile the forward and backward times of PyTorch's nn.Module☆83Updated 11 months ago