RobinWu218 / ToSTLinks

[ICLR 2025 Spotlight] Official Implementation for ToST (Token Statistics Transformer)

☆110

Alternatives and similar repositories for ToST

Users that are interested in ToST are comparing it to the libraries listed below

Sorting:

TsinghuaC3I / Fourier-Position-Embedding
[ICML 2025] Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization
☆76Updated last month
zhixuan-lin / forgetting-transformer
[ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"
☆116Updated last week
NVlabs / GatedDeltaNet
[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule
☆185Updated 4 months ago
fla-org / flash-bidirectional-linear-attention
Triton implement of bi-directional (non-causal) linear attention
☆52Updated 5 months ago
LINs-lab / UCGM
[Preprint] UCGM: Unified Continuous Generative Models
☆161Updated last month
HanGuo97 / log-linear-attention
☆222Updated last month
VainF / TinyFusion
[CVPR 2025 Highlight] TinyFusion: Diffusion Transformers Learned Shallow
☆130Updated 3 months ago
ML-GSAI / LLaDA-V
☆174Updated 3 weeks ago
feizc / Dimba
Transformer-Mamba Diffusion Models
☆110Updated last year
lucidrains / maskbit-pytorch
Implementation of the proposed MaskBit from Bytedance AI
☆82Updated 8 months ago
ApexGen-X / MergeVQ
[CVPR] MergeVQ: A Unified Framework for Visual Generation and Representation with Token Merging and Quantization
☆36Updated 3 weeks ago
jacklishufan / OmniFlows
The official implementation of OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
☆76Updated last month
tensorgi / TPA
The official implementation of TPA: Tensor ProducT ATTenTion Transformer (T6) (https://arxiv.org/abs/2501.06425)
☆376Updated 2 weeks ago
keshik6 / grafting
Exploring Diffusion Transformer Designs via Grafting
☆45Updated last month
goombalab / hydra
Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"
☆144Updated 5 months ago
jacklishufan / LaViDa
Official Implementation of LaViDa: :A Large Diffusion Language Model for Multimodal Understanding
☆115Updated 3 weeks ago
hp-l33 / AiM
Official PyTorch Implementation of "Scalable Autoregressive Image Generation with Mamba"
☆137Updated 6 months ago
horseee / dKV-Cache
☆88Updated last month
yu-rp / KANbeFair
A More Fair and Comprehensive Comparison between KAN and MLP
☆171Updated 11 months ago
horseee / learning-to-cache
[NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching
☆107Updated last year
qhfan / RALA
[CVPR2025] Breaking the Low-Rank Dilemma of Linear Attention
☆25Updated 4 months ago
MCG-NJU / DDT
DDT: Decoupled Diffusion Transformer
☆264Updated 2 weeks ago
apple / ml-flextok
FlexTok: Resampling Images into 1D Token Sequences of Flexible Length
☆216Updated last month
cfifty / rotation_trick
☆130Updated 3 months ago
CompVis / discrete-interpolants
The official implementation of "[MASK] is All You Need"
☆121Updated 4 months ago
feizc / Diffusion-RWKV
Scaling RWKV-Like Architectures for Diffusion Models
☆135Updated last year
NUS-HPC-AI-Lab / Dynamic-Diffusion-Transformer
☆82Updated 3 months ago
lucidrains / multimodal-dit-pytorch
Implementation of a multimodal diffusion transformer in Pytorch
☆102Updated last year
chenllliang / DnD-Transformer
[ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…
☆76Updated 7 months ago
LINs-lab / GMem
[Preprint] GMem: A Modular Approach for Ultra-Efficient Generative Models
☆38Updated 4 months ago