apple / ml-ademamixLinks

☆61

Alternatives and similar repositories for ml-ademamix

Users that are interested in ml-ademamix are comparing it to the libraries listed below

Sorting:

ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆82Updated 7 months ago
NX-AI / mlstm_kernels
Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.
☆64Updated 2 weeks ago
fal-ai-community / nano-mdm
Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun
☆54Updated 4 months ago
shikaiqiu / compute-better-spent
☆53Updated 9 months ago
lucidrains / GAF-microbatch-pytorch
Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch
☆25Updated 5 months ago
proger / accelerated-scan
Accelerated First Order Parallel Associative Scan
☆182Updated 10 months ago
epfml / DenseFormer
☆81Updated last year
fal-ai-community / NativeSparseAttention
research impl of Native Sparse Attention (2502.11089)
☆54Updated 4 months ago
cloneofsimo / zeroshampoo
☆34Updated 10 months ago
dvruette / barrel-rec-pytorch
☆53Updated last year
cloneofsimo / min-fsdp
☆79Updated last year
lucidrains / taylor-series-linear-attention
Explorations into the recently proposed Taylor Series Linear Attention
☆99Updated 10 months ago
fal-ai / diffusion-speedrun
Focused on fast experimentation and simplicity
☆76Updated 6 months ago
nikhilvyas / SOAP
☆197Updated 7 months ago
ethansmith2000 / TransformerExperiments
☆19Updated 2 months ago
cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆129Updated last year
lucidrains / grokfast-pytorch
Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
☆101Updated 6 months ago
lucidrains / gateloop-transformer
Implementation of GateLoop Transformer in Pytorch and Jax
☆89Updated last year
warner-benjamin / optimi
Fast, Modern, and Low Precision PyTorch Optimizers
☆97Updated this week
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
kvfrans / splus
☆110Updated last month
thecharlieblake / lovely-llama
An implementation of the Llama architecture, to instruct and delight
☆21Updated last month
cloneofsimo / ezmup
Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam
☆82Updated 11 months ago
lucidrains / adam-atan2-pytorch
Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch
☆110Updated 7 months ago
joey00072 / ohara
Collection of autoregressive model implementation
☆85Updated 2 months ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆79Updated last year
graphcore-research / out-of-the-box-fp8-training
Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.
☆46Updated last year
NVlabs / Forecasting-Model-Search
A system for automating selection and optimization of pre-trained models from the TAO Model Zoo
☆25Updated last year
SHI-Labs / CompactNet
☆31Updated last year
apple / ml-planner
☆53Updated last year