qiuzh20 / gated_attentionLinks

The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

☆273

Alternatives and similar repositories for gated_attention

Users that are interested in gated_attention are comparing it to the libraries listed below

Sorting:

hanyang1999 / discrete-diffusion-papers
A collection of papers on discrete diffusion models
☆166Updated 5 months ago
OpenSparseLLMs / MoM
☆110Updated 2 months ago
NVlabs / GatedDeltaNet
[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule
☆382Updated 2 months ago
zhijie-group / Discrete-Diffusion-Forcing
Discrete Diffusion Forcing (D2F): dLLMs Can Do Faster-Than-AR Inference
☆205Updated 2 months ago
TsinghuaC3I / Fourier-Position-Embedding
[ICML 2025] Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization
☆104Updated 6 months ago
zhixuan-lin / forgetting-transformer
[ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning
☆134Updated last month
LINs-lab / DynMoE
[ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
☆147Updated 4 months ago
jxiw / M1
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
☆45Updated 4 months ago
Gen-Verse / dLLM-RL
TraceRL & TraDo-8B: Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models
☆339Updated 2 weeks ago
yczhou001 / Awesome-Diffusion-LLM
paper list, tutorial, and nano code snippet for Diffusion Large Language Models.
☆136Updated 5 months ago
maomaocun / dLLM-cache
Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache…
☆186Updated 3 weeks ago
sustcsonglin / linear-attention-and-beyond-slides
☆99Updated 9 months ago
attention-survey / Efficient_Attention_Survey
A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention
☆240Updated last week
OpenSparseLLMs / Linearization
☆61Updated 5 months ago
ML-GSAI / LLaDA-V
☆293Updated last month
thu-ml / ReMoE
[ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.
☆99Updated 11 months ago
RobinWu218 / ToST
[ICLR 2025 Spotlight] Official Implementation for ToST (Token Statistics Transformer)
☆127Updated 9 months ago
ML-GSAI / Diffusion-LLM-Papers
A Collection of Papers on Diffusion Language Models
☆148Updated 2 months ago
mit-han-lab / x-attention
[ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring
☆256Updated 5 months ago
zitian-gao / one-shot-em
One-shot Entropy Minimization
☆187Updated 5 months ago
OpenSparseLLMs / Skip-DiT
✈️ [ICCV 2025] Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints
☆78Updated 4 months ago
horseee / dKV-Cache
[NeurIPS'25] dKV-Cache: The Cache for Diffusion Language Models
☆120Updated 6 months ago
XunhaoLai / native-sparse-attention-triton
Efficient triton implementation of Native Sparse Attention.
☆251Updated 6 months ago
thu-nics / R2R
[NeurIPS'25] The official code implementation for paper "R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Tok…
☆65Updated this week
pengzhangzhi / Open-dLLM
The most open diffusion language model for code generation — releasing pretraining, evaluation, inference, and checkpoints.
☆469Updated 3 weeks ago
fscdc / Awesome-Efficient-Reasoning-Models
[TMLR 2025] Efficient Reasoning Models: A Survey
☆282Updated last month
JieShibo / MoLE
[ICML 2025 Oral] Mixture of Lookup Experts
☆56Updated this week
fla-org / flame
🔥 A minimal training framework for scaling FLA models
☆314Updated 3 weeks ago
LiangrunFlora / Slow-Fast-Sampling
Official PyTorch implementation of the paper "Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Princ…
☆35Updated 4 months ago
HanGuo97 / log-linear-attention
☆256Updated 6 months ago