cheind / mingruLinks
Torch MinGRU implementation based on "Were RNNs All We Needed?"
☆20Updated last year
Alternatives and similar repositories for mingru
Users that are interested in mingru are comparing it to the libraries listed below
Sorting:
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆135Updated 3 months ago
- an implementation of FAdam (Fisher Adam) in PyTorch☆50Updated 7 months ago
- The Gaussian Histogram Loss (HL-Gauss) proposed by Imani et al. with a few convenient wrappers for regression, in Pytorch☆73Updated 2 months ago
- Implementation of a Light Recurrent Unit in Pytorch☆49Updated last year
- Implementation of the proposed DeepCrossAttention by Heddes et al at Google research, in Pytorch☆96Updated 11 months ago
- Implementation of the transformer proposed in "Building Blocks for a Complex-Valued Transformer Architecture"☆88Updated 2 years ago
- Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public☆168Updated 3 weeks ago
- Implementation of Agent Attention in Pytorch☆93Updated last year
- A PyTorch wrapper of parallel exclusive scan in CUDA☆12Updated 2 years ago
- Official Code Implementation for 'A Simple Early Exiting Framework for Accelerated Sampling in Diffusion Models'☆20Updated last year
- Implementation of the proposed minGRU in Pytorch☆319Updated last month
- Implementations of various linear RNN layers using pytorch and triton☆54Updated 2 years ago
- Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.☆84Updated 2 months ago
- FlashRNN - Fast RNN Kernels with I/O Awareness☆174Updated 3 months ago
- Triton implement of bi-directional (non-causal) linear attention☆65Updated this week
- [ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning☆137Updated last month
- ☆166Updated 3 months ago
- A simple implementation of [Mamba: Linear-Time Sequence Modeling with Selective State Spaces](https://arxiv.org/abs/2312.00752)☆22Updated 2 years ago
- Here we will test various linear attention designs.☆62Updated last year
- [ICASSP'22] Integer-only Zero-shot Quantization for Efficient Speech Recognition☆34Updated 4 years ago
- Explorations into the recently proposed Taylor Series Linear Attention☆100Updated last year
- A State-Space Model with Rational Transfer Function Representation.☆83Updated last year
- Implementation of 2-simplicial attention proposed by Clift et al. (2019) and the recent attempt to make practical in Fast and Simplex, Ro…☆46Updated 5 months ago
- A Triton Kernel for incorporating Bi-Directionality in Mamba2☆78Updated last year
- My attempts at applying Soundstream design on learned tokenization of text and then applying hierarchical attention to text generation☆90Updated last year
- Implementation of GateLoop Transformer in Pytorch and Jax☆92Updated last year
- Pytorch Implementation of the sparse attention from the paper: "Generating Long Sequences with Sparse Transformers"☆93Updated last week
- Non official implementation of the Linear Recurrent Unit (LRU, Orvieto et al. 2023)☆61Updated 5 months ago
- PyTorch implementation of the Flash Spectral Transform Unit.☆21Updated last year
- [ICLR 2025 Spotlight] Official Implementation for ToST (Token Statistics Transformer)☆130Updated 11 months ago