ZixuanJiang / pre-rmsnorm-transformerLinks

☆24

Alternatives and similar repositories for pre-rmsnorm-transformer

Users that are interested in pre-rmsnorm-transformer are comparing it to the libraries listed below

Sorting:

HazyResearch / train-tk
train with kittens!
☆61Updated 8 months ago
graphcore-research / out-of-the-box-fp8-training
Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.
☆46Updated last year
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆79Updated last year
xiayuqing0622 / flex_head_fa
Fast and memory-efficient exact attention
☆68Updated 4 months ago
axonn-ai / axonn
A parallel framework for training deep neural networks
☆62Updated 4 months ago
gpu-mode / discord-cluster-manager
Write a fast kernel and run it on Discord. See how you compare against the best!
☆46Updated this week
stas00 / ml-ways
ML/DL Math and Method notes
☆61Updated last year
HazyResearch / butterfly
Butterfly matrix multiplication in PyTorch
☆172Updated last year
HazyResearch / fly
☆210Updated 2 years ago
DS3Lab / CocktailSGD
☆27Updated last year
stanford-futuredata / stk
☆106Updated 10 months ago
Edward-Sun / gpt-accelera
Simple and efficient pytorch-native transformer training and inference (batched)
☆77Updated last year
athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆123Updated 7 months ago
CerebrasResearch / Sparse-IFT
Official repository of Sparse ISO-FLOP Transformations for Maximizing Training Efficiency
☆25Updated 11 months ago
AI-Hypercomputer / jetstream-pytorch
PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"
☆64Updated 3 months ago
spcl / sten
Sparsity support for PyTorch
☆35Updated 3 months ago
alexzhang13 / Triton-Puzzles-Solutions
Personal solutions to the Triton Puzzles
☆19Updated last year
gpu-mode / ring-attention
ring-attention experiments
☆143Updated 9 months ago
google / aqt
☆320Updated 2 weeks ago
shikaiqiu / compute-better-spent
☆53Updated 9 months ago
srush / do-we-need-attention
☆166Updated 2 years ago
xrsrke / pipegoose
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
☆85Updated last year
google-deepmind / dks
Multi-framework implementation of Deep Kernel Shaping and Tailored Activation Transformations, which are methods that modify neural netwo…
☆71Updated 2 weeks ago
lucidrains / taylor-series-linear-attention
Explorations into the recently proposed Taylor Series Linear Attention
☆99Updated 11 months ago
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆136Updated 3 months ago
UmerHA / triton_util
Make triton easier
☆47Updated last year
proger / accelerated-scan
Accelerated First Order Parallel Associative Scan
☆182Updated 10 months ago
facebookresearch / MODel_opt
Memory Optimizations for Deep Learning (ICML 2023)
☆64Updated last year
srush / mamba-primer
☆37Updated last year
HazyResearch / flash-fft-conv
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
☆323Updated 6 months ago