MoonshotAI/Attention-Residuals

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/MoonshotAI/Attention-Residuals)

MoonshotAI / Attention-Residuals

☆3,351

Alternatives and similar repositories for Attention-Residuals

Users that are interested in Attention-Residuals are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

MoonshotAI / Kimi-Linear
View on GitHub
☆1,457Nov 17, 2025Updated 8 months ago
deepseek-ai / Engram
View on GitHub
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
☆4,534Jan 14, 2026Updated 6 months ago
fla-org / flash-linear-attention
View on GitHub
🚀 Efficient implementations for emerging model architectures
☆5,367Updated this week
MoonshotAI / FlashKDA
View on GitHub
FlashKDA: high-performance Kimi Delta Attention kernels
☆455May 26, 2026Updated last month
bytetriper / RAE
View on GitHub
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
☆1,977Feb 25, 2026Updated 4 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
THUDM / slime
View on GitHub
slime is an LLM post-training framework for RL Scaling.
☆7,533Updated this week
mit-han-lab / flash-moba
View on GitHub
☆250Nov 19, 2025Updated 8 months ago
hustvl / MoDA
View on GitHub
An hardware-aware Efficient Implementation for "Mixture-of-Depths Attention".
☆273May 6, 2026Updated 2 months ago
deepseek-ai / TileKernels
View on GitHub
A kernel library written in tilelang
☆1,642Apr 23, 2026Updated 2 months ago
deepseek-ai / DeepSeek-V3.2-Exp
View on GitHub
☆1,620Nov 18, 2025Updated 8 months ago
KellerJordan / Muon
View on GitHub
Muon is an optimizer for hidden layers in neural networks
☆2,705May 24, 2026Updated last month
verl-project / verl
View on GitHub
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
☆22,542Updated this week
ByteDance-Seed / VeOmni
View on GitHub
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
☆2,095Updated this week
facebookresearch / dinov3
View on GitHub
Reference PyTorch implementation and models for DINOv3
☆10,967Updated this week
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
qiuzh20 / gated_attention
View on GitHub
The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink…
☆969Dec 20, 2025Updated 7 months ago
Dao-AILab / sonic-moe
View on GitHub
Accelerating MoE with IO and Tile-aware Optimizations
☆731Jul 4, 2026Updated 2 weeks ago
MiniMax-AI / MSA
View on GitHub
☆379Jun 15, 2026Updated last month
karpathy / autoresearch
View on GitHub
AI agents running research on single-GPU nanochat training automatically
☆91,539Mar 26, 2026Updated 3 months ago
tile-ai / tilelang
View on GitHub
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
☆6,667Updated this week
LTH14 / JiT
View on GitHub
PyTorch implementation of JiT https://arxiv.org/abs/2511.13720
☆2,457Dec 8, 2025Updated 7 months ago
deepseek-ai / DeepSpec
View on GitHub
DeepSpec: a full-stack codebase for training and evaluating speculative decoding algorithms
☆6,696Jul 9, 2026Updated last week
MoonshotAI / MoBA
View on GitHub
MoBA: Mixture of Block Attention for Long-Context LLMs
☆2,148Apr 3, 2025Updated last year
SandAI-org / MagiAttention
View on GitHub
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training
☆882Updated this week
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
sgl-project / sglang
View on GitHub
SGLang is a high-performance serving framework for large language models and multimodal models.
☆30,498Updated this week
lillian039 / ELF
View on GitHub
☆933Jun 26, 2026Updated 3 weeks ago
tokenbender / mHC-manifold-constrained-hyper-connections
View on GitHub
implementations and experimentation on mHC by deepseek - https://arxiv.org/abs/2512.24880
☆367Feb 17, 2026Updated 5 months ago
MoonshotAI / checkpoint-engine
View on GitHub
Checkpoint-engine is a simple middleware to update model weights in LLM inference engines
☆969Jul 4, 2026Updated 2 weeks ago
NVlabs / GatedDeltaNet
View on GitHub
[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule
☆623Mar 13, 2026Updated 4 months ago
Jiawei-Yang / FD-Loss
View on GitHub
☆544May 1, 2026Updated 2 months ago
MoonshotAI / Moonlight
View on GitHub
Muon is Scalable for LLM Training
☆1,504Aug 3, 2025Updated 11 months ago
Dao-AILab / flash-attention
View on GitHub
Fast and memory-efficient exact attention
☆24,489Updated this week
QwenLM / FlashQLA
View on GitHub
high-performance linear attention kernel library built on TileLang
☆597Updated this week
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
Gen-Verse / OpenClaw-RL
View on GitHub
OpenClaw-RL: Train any agent simply by talking
☆5,588May 23, 2026Updated last month
QwenLM / Qwen3-VL
View on GitHub
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
☆19,620Jan 30, 2026Updated 5 months ago
NVlabs / Long-RL
View on GitHub
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
☆726Sep 24, 2025Updated 9 months ago
facebookresearch / tuna-2
View on GitHub
Official implementation of Tuna-2: Pixel Embeddings Beat Vision Encoders for Unified Understanding and Generation
☆737Updated this week
deepseek-ai / DeepSeek-OCR
View on GitHub
Contexts Optical Compression
☆23,608Jan 27, 2026Updated 5 months ago
baaivision / Emu3.5
View on GitHub
Native Multimodal Models are World Learners
☆1,535Dec 30, 2025Updated 6 months ago
yifan123 / flow_grpo
View on GitHub
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
☆2,420May 7, 2026Updated 2 months ago