m-a-n-i-f-e-s-t / power-attentionView external linksLinks
Attention Kernels for Symmetric Power Transformers
☆129Sep 25, 2025Updated 4 months ago
Alternatives and similar repositories for power-attention
Users that are interested in power-attention are comparing it to the libraries listed below
Sorting:
- Scalable and Stable Parallelization of Nonlinear RNNS☆29Oct 21, 2025Updated 3 months ago
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Jun 6, 2024Updated last year
- ☆53May 20, 2024Updated last year
- sigma-MoE layer☆21Jan 5, 2024Updated 2 years ago
- 📄Small Batch Size Training for Language Models☆80Oct 4, 2025Updated 4 months ago
- ☆35Apr 12, 2024Updated last year
- HGRN2: Gated Linear RNNs with State Expansion☆56Aug 20, 2024Updated last year
- Transformers components but in Triton☆34May 9, 2025Updated 9 months ago
- ☆19Dec 4, 2025Updated 2 months ago
- Code for the paper "Function-Space Learning Rates"☆25Jun 3, 2025Updated 8 months ago
- Code for the paper: https://arxiv.org/pdf/2309.06979.pdf☆21Jul 29, 2024Updated last year
- Griffin MQA + Hawk Linear RNN Hybrid☆89Apr 26, 2024Updated last year
- ☆24Sep 25, 2024Updated last year
- A repository for research on medium sized language models.☆77May 23, 2024Updated last year
- POPGym Library in JAX☆12Apr 15, 2024Updated last year
- Display tensors directly from GPU☆11Oct 12, 2025Updated 4 months ago
- Unofficial implementation of paper : Exploring the Space of Key-Value-Query Models with Intention