tensorgi / T6
The official implementation of Tensor ProducT ATTenTion Transformer (T6)
β302Updated this week
Alternatives and similar repositories for T6:
Users that are interested in T6 are comparing it to the libraries listed below
- β253Updated 5 months ago
- Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Modelsβ260Updated this week
- [ICLR2025 Spotlightπ₯] Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parametersβ514Updated last week
- Normalized Transformer (nGPT)β152Updated 3 months ago
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAIβ273Updated 3 months ago
- Efficient LLM Inference over Long Sequencesβ357Updated last week
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsβ¦β297Updated 2 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)β149Updated 2 months ago
- Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793β385Updated 2 months ago
- When it comes to optimizers, it's always better to be safe than sorryβ179Updated 3 weeks ago
- Official JAX implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden Statesβ397Updated 6 months ago
- Helpful tools and examples for working with flex-attentionβ647Updated this week
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Headsβ426Updated last week
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Ruleβ131Updated last week
- Muon optimizer: +~30% sample efficiency with <3% wallclock overheadβ254Updated last week
- Implementation of π Ring Attention, from Liu et al. at Berkeley AI, in Pytorchβ502Updated 3 months ago
- PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"β157Updated 3 weeks ago
- β167Updated 2 months ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"β221Updated this week
- FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Coresβ296Updated last month
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Modelsβ196Updated 3 weeks ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"β215Updated 3 weeks ago
- PyTorch implementation of models from the Zamba2 series.β176Updated 3 weeks ago
- [ICML 2024] CLLMs: Consistency Large Language Modelsβ372Updated 3 months ago
- Implementation of Infini-Transformer in Pytorchβ109Updated last month
- [ECCV 2024] Official PyTorch implementation of RoPE-ViT "Rotary Position Embedding for Vision Transformer"β283Updated 2 months ago
- The AdEMAMix Optimizer: Better, Faster, Older.β178Updated 5 months ago
- β143Updated last year
- β181Updated this week