Doraemonzzz / xmixers
View external linksLinks

Xmixers: A collection of SOTA efficient token/channel mixers

☆28

Alternatives and similar repositories for xmixers

Users that are interested in xmixers are comparing it to the libraries listed below

Sorting:

glassroom / heinsen_attention
View on GitHub
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
☆24Jun 6, 2024Updated last year
dangxingyu / rnn-icrag
View on GitHub
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Apr 17, 2024Updated last year
kazuki-irie / kv-memory-brain
View on GitHub
Official Code Repository for the paper "Key-value memory in the brain"
☆31Feb 25, 2025Updated 11 months ago
OpenNLPLab / HGRN2
View on GitHub
HGRN2: Gated Linear RNNs with State Expansion
☆56Aug 20, 2024Updated last year
maximzubkov / fft-scan
View on GitHub
Efficient PScan implementation in PyTorch
☆17Jan 2, 2024Updated 2 years ago
BlinkDL / LinearAttentionArena
View on GitHub
Here we will test various linear attention designs.
☆62Apr 25, 2024Updated last year
howard-hou / RWKV-X
View on GitHub
RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's l…
☆54Jan 12, 2026Updated last month
Yifei-Zuo / Flash-LLA
View on GitHub
Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…
☆23Oct 1, 2025Updated 4 months ago
Doraemonzzz / hgru2-pytorch
View on GitHub
☆24Sep 25, 2024Updated last year
sustcsonglin / disco-pointer
View on GitHub
Official Implementation of ACL2023: Don't Parse, Choose Spans! Continuous and Discontinuous Constituency Parsing via Autoregressive Span …
☆14Aug 25, 2023Updated 2 years ago
ACA-Lab-SJTU / token-ring
View on GitHub
☆13Jan 7, 2025Updated last year
Noahs-ARK / PaLM
View on GitHub
PyTorch implementation for PaLM: A Hybrid Parser and Language Model.
☆10Jan 7, 2020Updated 6 years ago
rycolab / aflt-f2023
View on GitHub
Advanced Formal Language Theory (263-5352-00L; Frühjahr 2023)
☆10Feb 21, 2023Updated 2 years ago
Doraemonzzz / nanoTransNormer
View on GitHub
☆11Oct 11, 2023Updated 2 years ago
HydraQYH / hp_rms_norm
View on GitHub
High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)
☆26Jan 22, 2026Updated 3 weeks ago
bdusell / stack-attention
View on GitHub
Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"
☆18Mar 15, 2024Updated last year
FieteLab / MESH
View on GitHub
☆14Jul 12, 2022Updated 3 years ago
tile-ai / AttentionEngine
View on GitHub
☆52May 19, 2025Updated 8 months ago
epfml / pam
View on GitHub
☆16Dec 9, 2023Updated 2 years ago
nil0x9 / flash-muon
View on GitHub
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆235Jun 15, 2025Updated 8 months ago
Shark-NLP / CAB
View on GitHub
☆31Jul 2, 2023Updated 2 years ago
sail-sg / SkyLadder
View on GitHub
The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling
☆42Dec 29, 2025Updated last month
WaveSpeedAI / QuantumAttention
View on GitHub
[WIP] Better (FP8) attention for Hopper
☆32Feb 24, 2025Updated 11 months ago
sgl-project / DeepGEMM
View on GitHub
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
☆22Feb 9, 2026Updated last week
tile-ai / TileOPs
View on GitHub
☆86Feb 10, 2026Updated last week
mit-han-lab / flash-moba
View on GitHub
☆223Nov 19, 2025Updated 2 months ago
srush / tangent
View on GitHub
Source-to-Source Debuggable Derivatives in Pure Python
☆15Jan 23, 2024Updated 2 years ago
proger / nanokitchen
View on GitHub
Parallel Associative Scan for Language Models
☆18Jan 8, 2024Updated 2 years ago
gmongaras / Cottention_Transformer
View on GitHub
Code for the paper "Cottention: Linear Transformers With Cosine Attention"
☆20Nov 15, 2025Updated 3 months ago
zhangjiong724 / spectral-RNN
View on GitHub
STABILIZING GRADIENTS FOR DEEP NEURAL NETWORKS VIA EFFICIENT SVD PARAMETERIZATION
☆16Jun 5, 2018Updated 7 years ago
jungokasai / T2R
View on GitHub
☆14Nov 20, 2022Updated 3 years ago
berlino / gated_linear_attention
View on GitHub
☆106Mar 9, 2024Updated last year
radarFudan / mamba-minimal-jax
View on GitHub
☆35Nov 22, 2024Updated last year
li-plus / flash-preference
View on GitHub
Accelerate LLM preference tuning via prefix sharing with a single line of code
☆51Jul 4, 2025Updated 7 months ago
yikangshen / megablocks
View on GitHub
☆20May 30, 2024Updated last year
srush / mamba-primer
View on GitHub
☆39Apr 5, 2024Updated last year
yifanzhang-pro / HLA
View on GitHub
Official Project Page for HLA: Higher-order Linear Attention (https://arxiv.org/abs/2510.27258)
☆44Jan 6, 2026Updated last month
yangjackie / Topics-on-diffusion-generative-models
View on GitHub
☆28Oct 2, 2025Updated 4 months ago
SqueezeAILab / SqueezedAttention
View on GitHub
[ACL 2025] Squeezed Attention: Accelerating Long Prompt LLM Inference
☆56Nov 20, 2024Updated last year

Doraemonzzz / xmixersView external linksLinks

Alternatives and similar repositories for xmixers

Doraemonzzz / xmixers
View external linksLinks