lucidrains / titans-pytorch
Unofficial implementation of Titans, SOTA memory for transformers, in Pytorch
☆297Updated this week
Alternatives and similar repositories for titans-pytorch:
Users that are interested in titans-pytorch are comparing it to the libraries listed below
- Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters☆477Updated this week
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI☆270Updated 2 months ago
- Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"☆831Updated last month
- PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"☆154Updated 2 months ago
- A Self-adaptation Framework🐙 that adapts LLMs for unseen tasks in real-time!☆343Updated this week
- Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793☆376Updated last month
- ☆240Updated 4 months ago
- ☆152Updated last month
- Official JAX implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States☆384Updated 5 months ago
- Code for BLT research paper☆1,314Updated this week
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆188Updated 2 weeks ago
- Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling☆180Updated 2 months ago
- PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention…☆286Updated 8 months ago
- Annotated version of the Mamba paper☆469Updated 10 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆277Updated last month
- Build high-performance AI models with modular building blocks☆456Updated this week
- An open source implementation of LFMs from Liquid AI: Liquid Foundation Models☆137Updated this week
- Efficient LLM Inference over Long Sequences☆344Updated 2 weeks ago
- [ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation☆681Updated 3 months ago
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆273Updated 2 months ago
- 🚀 Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton☆1,669Updated this week
- ☆491Updated 5 months ago
- ☆304Updated 2 weeks ago
- PyTorch implementation of models from the Zamba2 series.☆166Updated last month
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆492Updated 2 months ago
- Muon optimizer for neural networks: >30% extra sample efficiency, <3% wallclock overhead☆210Updated last week
- Official repository for the paper "Grokfast: Accelerated Grokking by Amplifying Slow Gradients"☆538Updated 6 months ago
- Code repository for Black Mamba☆234Updated 11 months ago
- Training Large Language Model to Reason in a Continuous Latent Space☆388Updated this week
- Helpful tools and examples for working with flex-attention☆583Updated this week