dtunai / Tri-RMSNormLinks
Efficient kernel for RMS normalization with fused operations, includes both forward and backward passes, compatibility with PyTorch.
☆12Updated last year
Alternatives and similar repositories for Tri-RMSNorm
Users that are interested in Tri-RMSNorm are comparing it to the libraries listed below
Sorting:
- Implementation of the proposed minGRU in Pytorch☆319Updated 2 months ago
- Accelerated First Order Parallel Associative Scan☆196Updated last month
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆135Updated 3 months ago
- optimizer & lr scheduler & loss function collections in PyTorch☆387Updated this week
- Efficient optimizers☆281Updated last month
- Grams: Gradient Descent with Adaptive Momentum Scaling (ICLR 2025 Workshop)☆17Updated 11 months ago
- Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793☆452Updated 8 months ago
- Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public☆172Updated last week
- Official Code Implementation for 'A Simple Early Exiting Framework for Accelerated Sampling in Diffusion Models'☆20Updated last year
- Speedup the attention computation of Swin Transformer☆31Updated 7 months ago
- Official implementation of the paper "Distilling a Pretrained Language Model to a Multilingual ASR Model" (Interspeech 2022)☆12Updated last year
- RWKV, in easy to read code☆72Updated 10 months ago
- Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible…☆98Updated 2 weeks ago
- ☆250Updated last year
- A simple but robust PyTorch implementation of RetNet from "Retentive Network: A Successor to Transformer for Large Language Models" (http…☆106Updated 2 years ago
- FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores☆341Updated last year
- JAX Implementations of Descript Audio Codec and EnCodec☆33Updated 10 months ago
- [INTERSPEECH 2024] Official code for VoxSim: A perceptual voice similarity dataset☆12Updated 4 months ago
- Griffin MQA + Hawk Linear RNN Hybrid☆88Updated last year
- A repository for log-time feedforward networks☆224Updated last year
- ☆163Updated 3 years ago
- Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam☆85Updated last year
- The AdEMAMix Optimizer: Better, Faster, Older.☆186Updated last year
- [ICLR 2025] AdaFisher: Adaptive Second Order Optimization via Fisher Information☆51Updated last year
- Inspired by "Neural Networks Fail to Learn Periodic Functions and How to Fix It"☆73Updated 7 months ago
- Pytorch implementation of Simplified Structured State-Spaces for Sequence Modeling (S5)☆82Updated last year
- Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase Prediction☆13Updated last year
- Conformer block with Rotary Position Embedding, modified from lucidrains' implement☆16Updated last year
- Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.☆84Updated 2 months ago
- A PyTorch wrapper of parallel exclusive scan in CUDA☆12Updated 2 years ago