pytorch-labs / attention-gymLinks

Helpful tools and examples for working with flex-attention

☆904

Alternatives and similar repositories for attention-gym

Users that are interested in attention-gym are comparing it to the libraries listed below

Sorting:

lucidrains / ring-attention-pytorch
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
☆532Updated 2 months ago
BobMcDear / attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
☆563Updated 2 weeks ago
KellerJordan / Muon
Muon is an optimizer for hidden layers in neural networks
☆1,390Updated 3 weeks ago
apple / ml-cross-entropy
☆506Updated this week
haoliuhl / ringattention
Large Context Attention
☆719Updated 6 months ago
pytorch / PiPPy
Pipeline Parallelism for PyTorch
☆775Updated 11 months ago
zhuzilin / ring-flash-attention
Ring attention implementation with flash attention
☆828Updated this week
srush / annotated-mamba
Annotated version of the Mamba paper
☆487Updated last year
facebookresearch / spdl
Scalable and Performant Data Loading
☆290Updated last week
fla-org / native-sparse-attention
🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"
☆731Updated 4 months ago
lucidrains / native-sparse-attention-pytorch
Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper
☆673Updated last month
foundation-model-stack / fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…
☆258Updated last week
Haiyang-W / TokenFormer
[ICLR2025 Spotlight🔥] Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
☆567Updated 5 months ago
zyushun / Adam-mini
Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793
☆431Updated 2 months ago
facebookresearch / optimizers
For optimization algorithm research and development.
☆525Updated this week
NVIDIA / kvpress
LLM KV cache compression made easy
☆560Updated this week
lucidrains / rotary-embedding-torch
Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch
☆719Updated this week
microsoft / Tutel
Tutel MoE: Optimized Mixture-of-Experts Library, Support DeepSeek/Kimi-K2/Qwen3 FP8/FP4
☆870Updated last week
apple / ml-sigmoid-attention
☆293Updated 3 months ago
HazyResearch / flash-fft-conv
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
☆325Updated 7 months ago
fla-org / flame
🔥 A minimal training framework for scaling FLA models
☆209Updated last month
pytorch-labs / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆224Updated last year
HazyResearch / aisys-building-blocks
Building blocks for foundation models.
☆519Updated last year
NVIDIA-NeMo / RL
Scalable toolkit for efficient model reinforcement
☆558Updated this week
goombalab / hnet
H-Net: Hierarchical Network with Dynamic Chunking
☆593Updated 2 weeks ago
pytorch / torchft
Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)
☆366Updated last week
Dao-AILab / quack
A Quirky Assortment of CuTe Kernels
☆374Updated this week
pytorch / tensordict
TensorDict is a pytorch dedicated tensor container.
☆945Updated this week
Azure / MS-AMP
Microsoft Automatic Mixed Precision Library
☆616Updated 10 months ago
NVIDIA / Star-Attention
Efficient LLM Inference over Long Sequences
☆385Updated last month