test-time-training / ttt-tkLinks
☆37Updated 3 months ago
Alternatives and similar repositories for ttt-tk
Users that are interested in ttt-tk are comparing it to the libraries listed below
Sorting:
- Stick-breaking attention☆58Updated last week
- An efficient implementation of the NSA (Native Sparse Attention) kernel☆89Updated 2 weeks ago
- Fast and memory-efficient exact attention☆68Updated 4 months ago
- ☆48Updated last year
- ☆76Updated 4 months ago
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆138Updated last month
- ☆82Updated 10 months ago
- Here we will test various linear attention designs.☆60Updated last year
- [ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"☆115Updated last week
- The evaluation framework for training-free sparse attention in LLMs☆82Updated 3 weeks ago
- Using FlexAttention to compute attention with different masking patterns☆44Updated 9 months ago
- ☆222Updated last month
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule☆185Updated 3 months ago
- Combining SOAP and MUON☆16Updated 5 months ago
- Efficient PScan implementation in PyTorch☆16Updated last year
- ☆37Updated last year
- JAX bindings for Flash Attention v2☆90Updated 11 months ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆78Updated last month
- Mixture of A Million Experts☆46Updated 11 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆128Updated 7 months ago
- Awesome Triton Resources☆31Updated 2 months ago
- ☆32Updated last year
- ☆53Updated last year
- Triton Implementation of HyperAttention Algorithm☆48Updated last year
- Official code for the paper "Attention as a Hypernetwork"☆40Updated last year
- ☆55Updated last year
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Updated last year
- Official PyTorch Implementation of the Longhorn Deep State Space Model☆52Updated 7 months ago
- DPO, but faster 🚀☆43Updated 7 months ago
- Transformers components but in Triton☆34Updated 2 months ago