NX-AI / mlstm_kernelsLinks

Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.

☆72

Alternatives and similar repositories for mlstm_kernels

Users that are interested in mlstm_kernels are comparing it to the libraries listed below

Sorting:

NX-AI / flashrnn
FlashRNN - Fast RNN Kernels with I/O Awareness
☆103Updated last week
lucidrains / gateloop-transformer
Implementation of GateLoop Transformer in Pytorch and Jax
☆90Updated last year
kjslag / spacebyte
A byte-level decoder architecture that matches the performance of tokenized Transformers.
☆66Updated last year
lucidrains / light-recurrent-unit-pytorch
Implementation of a Light Recurrent Unit in Pytorch
☆49Updated last year
frankxwang / dpo-prefix-sharing
DPO, but faster 🚀
☆45Updated 10 months ago
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆129Updated last year
epfml / DenseFormer
☆81Updated last year
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆98Updated 11 months ago
lucidrains / grokfast-pytorch
Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
☆102Updated 10 months ago
fal-ai-community / nano-mdm
Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun
☆56Updated 7 months ago
lucidrains / hyper-connections
Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public
☆91Updated 4 months ago
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
erogol / BlaGPT
Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible…
☆84Updated 2 weeks ago
lucidrains / adam-atan2-pytorch
Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch
☆132Updated 2 weeks ago
lucidrains / simplicial-attention
Implementation of 2-simplicial attention proposed by Clift et al. (2019) and the recent attempt to make practical in Fast and Simplex, Ro…
☆47Updated last month
proger / hippogriff
Griffin MQA + Hawk Linear RNN Hybrid
☆89Updated last year
kyleliang919 / Super_Muon
☆65Updated 7 months ago
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆61Updated last year
RWKV / ZeroCoT
https://x.com/BlinkDL_AI/status/1884768989743882276
☆28Updated 5 months ago
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆99Updated last year
apple / ml-ademamix
☆68Updated 11 months ago
graphcore-research / jax-scalify
JAX Scalify: end-to-end scaled arithmetics
☆16Updated last year
ruke1ire / RTF
A State-Space Model with Rational Transfer Function Representation.
☆82Updated last year
bluorion-com / ZClip
Official implementation of the paper: "ZClip: Adaptive Spike Mitigation for LLM Pre-Training".
☆136Updated 2 weeks ago
fal-ai-community / NativeSparseAttention
research impl of Native Sparse Attention (2502.11089)
☆62Updated 8 months ago
lucidrains / infini-transformer-pytorch
Implementation of Infini-Transformer in Pytorch
☆113Updated 9 months ago
lucidrains / mixture-of-attention
Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts
☆119Updated last year
nil0x9 / flash-muon
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆197Updated 4 months ago
joey00072 / ohara
Collection of autoregressive model implementation
☆86Updated 6 months ago
evanatyourservice / llm-jax
Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.
☆18Updated 3 months ago