lessw2020 / transformer_centralLinks

Various transformers for FSDP research

☆38

Alternatives and similar repositories for transformer_central

Users that are interested in transformer_central are comparing it to the libraries listed below

Sorting:

warner-benjamin / optimi
Fast, Modern, and Low Precision PyTorch Optimizers
☆116Updated 2 months ago
drisspg / transformer_nuggets
A place to store reusable transformer components of my own creation or found on the interwebs
☆62Updated 2 weeks ago
sholtodouglas / scalingExperiments
☆62Updated 3 years ago
cat-state / tinypar
☆20Updated 2 years ago
ayaka14732 / llama-2-jax
JAX implementation of the Llama 2 model
☆216Updated last year
mgmalek / efficient_cross_entropy
☆121Updated last year
HomebrewML / Olmax
HomebrewNLP in JAX flavour for maintable TPU-Training
☆51Updated last year
lucidrains / flash-attention-jax
Implementation of Flash Attention in Jax
☆222Updated last year
cloneofsimo / min-fsdp
☆91Updated last year
lucidrains / pytorch-custom-utils
Just some miscellaneous utility functions / decorators / modules related to Pytorch and Accelerate to help speed up implementation of new…
☆125Updated last year
microsoft / mutransformers
some common Huggingface transformers in maximal update parametrization (µP)
☆87Updated 3 years ago
thecharlieblake / lovely-llama
An implementation of the Llama architecture, to instruct and delight
☆21Updated 6 months ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
lucidrains / pause-transformer
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…
☆53Updated 2 years ago
rasbt / pytorch-memory-optim
This code repository contains the code used for my "Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch" blog po…
☆92Updated 2 years ago
catie-aq / flashT5
A fast implementation of T5/UL2 in PyTorch using Flash Attention
☆112Updated last month
lucidrains / CoLT5-attention
Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch
☆230Updated last year
xrsrke / pipegoose
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
☆87Updated last year
kyleliang919 / Long-context-transformers
Exploring finetuning public checkpoints on filter 8K sequences on Pile
☆116Updated 2 years ago
google / praxis
☆190Updated 2 weeks ago
ClashLuke / tpucare
Automatically take good care of your preemptible TPUs
☆37Updated 2 years ago
lucidrains / triton-transformer
Implementation of a Transformer, but completely in Triton
☆277Updated 3 years ago
lianakoleva / no-libtorch-compile
☆21Updated 9 months ago
huggingface / chug
Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.
☆160Updated last year
google-research / jestimator
Amos optimizer with JEstimator lib.
☆82Updated last year
HomebrewML / HomebrewNLP-torch
A case study of efficient training of large language models using commodity hardware.
☆68Updated 3 years ago
gnovack / distributed-training-and-deepspeed
☆17Updated 2 years ago
softmax1 / Flash-Attention-Softmax-N
CUDA and Triton implementations of Flash Attention with SoftmaxN.
☆73Updated last year
lucidrains / llama-qrlhf
Implementation of the Llama architecture with RLHF + Q-learning
☆168Updated 10 months ago
eth-easl / fmengine
Utilities for Training Very Large Models
☆58Updated last year