warner-benjamin / optimiLinks

Fast, Modern, and Low Precision PyTorch Optimizers

☆116

Alternatives and similar repositories for optimi

Users that are interested in optimi are comparing it to the libraries listed below

Sorting:

mgmalek / efficient_cross_entropy
☆121Updated last year
ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆83Updated 10 months ago
catie-aq / flashT5
A fast implementation of T5/UL2 in PyTorch using Flash Attention
☆107Updated 7 months ago
cloneofsimo / min-fsdp
☆91Updated last year
cat-state / tinypar
☆20Updated 2 years ago
HomebrewML / Olmax
HomebrewNLP in JAX flavour for maintable TPU-Training
☆51Updated last year
ClashLuke / SOAP
☆21Updated 11 months ago
microsoft / mutransformers
some common Huggingface transformers in maximal update parametrization (µP)
☆86Updated 3 years ago
euclaise / supertrainer2000
☆50Updated last year
kyleliang919 / Long-context-transformers
Exploring finetuning public checkpoints on filter 8K sequences on Pile
☆115Updated 2 years ago
pbelcak / fastfeedforward
A repository for log-time feedforward networks
☆222Updated last year
apoorvkh / torchrunx
Easily run PyTorch on multiple GPUs & machines
☆47Updated 2 weeks ago
lessw2020 / transformer_central
Various transformers for FSDP research
☆38Updated 2 years ago
ClashLuke / tpucare
Automatically take good care of your preemptible TPUs
☆37Updated 2 years ago
HomebrewML / HeavyBall
Efficient optimizers
☆274Updated last week
fal-ai-community / NativeSparseAttention
research impl of Native Sparse Attention (2502.11089)
☆62Updated 8 months ago
drisspg / transformer_nuggets
A place to store reusable transformer components of my own creation or found on the interwebs
☆60Updated last week
lucidrains / CoLT5-attention
Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch
☆230Updated last year
cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆132Updated last year
epfml / DenseFormer
☆81Updated last year
proger / hippogriff
Griffin MQA + Hawk Linear RNN Hybrid
☆89Updated last year
lucidrains / gateloop-transformer
Implementation of GateLoop Transformer in Pytorch and Jax
☆90Updated last year
dvruette / barrel-rec-pytorch
☆53Updated last year
lucidrains / grokfast-pytorch
Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
☆102Updated 10 months ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
kjslag / spacebyte
A byte-level decoder architecture that matches the performance of tokenized Transformers.
☆66Updated last year
joey00072 / ohara
Collection of autoregressive model implementation
☆86Updated 6 months ago
HazyResearch / zoology
Understand and test language model architectures on synthetic tasks.
☆233Updated last month
graphcore-research / unit-scaling
A library for unit scaling in PyTorch
☆130Updated 3 months ago
BlinkDL / SmallInitEmb
LayerNorm(SmallInit(Embedding)) in a Transformer to improve convergence
☆58Updated 3 years ago