ZixuanJiang / pre-rmsnorm-transformerLinks
☆24Updated 2 years ago
Alternatives and similar repositories for pre-rmsnorm-transformer
Users that are interested in pre-rmsnorm-transformer are comparing it to the libraries listed below
Sorting:
- train with kittens!☆61Updated 8 months ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆46Updated last year
- Experiment of using Tangent to autodiff triton☆79Updated last year
- Fast and memory-efficient exact attention☆68Updated 4 months ago
- A parallel framework for training deep neural networks☆62Updated 4 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆46Updated this week
- ML/DL Math and Method notes☆61Updated last year
- Butterfly matrix multiplication in PyTorch☆172Updated last year
- ☆210Updated 2 years ago
- ☆27Updated last year
- ☆106Updated 10 months ago
- Simple and efficient pytorch-native transformer training and inference (batched)☆77Updated last year
- A MAD laboratory to improve AI architecture designs 🧪☆123Updated 7 months ago
- Official repository of Sparse ISO-FLOP Transformations for Maximizing Training Efficiency☆25Updated 11 months ago
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"☆64Updated 3 months ago
- Sparsity support for PyTorch☆35Updated 3 months ago
- Personal solutions to the Triton Puzzles☆19Updated last year
- ring-attention experiments☆143Updated 9 months ago
- ☆320Updated 2 weeks ago
- ☆53Updated 9 months ago
- ☆166Updated 2 years ago
- Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*☆85Updated last year
- Multi-framework implementation of Deep Kernel Shaping and Tailored Activation Transformations, which are methods that modify neural netwo…☆71Updated 2 weeks ago
- Explorations into the recently proposed Taylor Series Linear Attention☆99Updated 11 months ago
- Collection of kernels written in Triton language☆136Updated 3 months ago
- Make triton easier☆47Updated last year
- Accelerated First Order Parallel Associative Scan☆182Updated 10 months ago
- Memory Optimizations for Deep Learning (ICML 2023)☆64Updated last year
- ☆37Updated last year
- FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores☆323Updated 6 months ago