test-time-training / ttt-tkLinks
☆40Updated 4 months ago
Alternatives and similar repositories for ttt-tk
Users that are interested in ttt-tk are comparing it to the libraries listed below
Sorting:
- Using FlexAttention to compute attention with different masking patterns☆44Updated 11 months ago
- Here we will test various linear attention designs.☆62Updated last year
- Efficient PScan implementation in PyTorch☆16Updated last year
- Stick-breaking attention☆59Updated last month
- The evaluation framework for training-free sparse attention in LLMs☆90Updated 2 months ago
- ☆49Updated last year
- Fast and memory-efficient exact attention☆70Updated 5 months ago
- An efficient implementation of the NSA (Native Sparse Attention) kernel☆113Updated 2 months ago
- Triton Implementation of HyperAttention Algorithm☆48Updated last year
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆160Updated 2 months ago
- 📄Small Batch Size Training for Language Models☆42Updated 2 weeks ago
- Mixture of A Million Experts☆46Updated last year
- ☆56Updated last year
- Parallel Associative Scan for Language Models☆18Updated last year
- Official code for the paper "Attention as a Hypernetwork"☆40Updated last year
- ☆237Updated 2 months ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆84Updated last month
- ☆33Updated last year
- Simple and efficient pytorch-native transformer training and inference (batched)☆78Updated last year
- Official PyTorch Implementation of the Longhorn Deep State Space Model☆54Updated 8 months ago
- [ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning☆124Updated last week
- ☆80Updated 5 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆128Updated 8 months ago
- Experiments on the impact of depth in transformers and SSMs.☆33Updated 9 months ago
- Awesome Triton Resources☆33Updated 3 months ago
- ☆85Updated last year
- ☆87Updated last year
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun☆56Updated 5 months ago
- ☆53Updated last year
- Experiment of using Tangent to autodiff triton☆80Updated last year