lucidrains / titans-pytorchLinks
Unofficial implementation of Titans, SOTA memory for transformers, in Pytorch
β1,455Updated 3 months ago
Alternatives and similar repositories for titans-pytorch
Users that are interested in titans-pytorch are comparing it to the libraries listed below
Sorting:
- Code for BLT research paperβ1,983Updated 3 months ago
- A Self-adaptation Frameworkπ that adapts LLMs for unseen tasks in real-time!β1,141Updated 7 months ago
- Muon is an optimizer for hidden layers in neural networksβ1,710Updated 2 months ago
- Continuous Thought Machines, because thought takes time and reasoning is a process.β1,294Updated 2 months ago
- Official PyTorch implementation for "Large Language Diffusion Models"β2,864Updated last week
- Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden Statesβ1,252Updated last year
- π Efficient implementations of state-of-the-art linear attention modelsβ3,281Updated this week
- Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paperβ740Updated last month
- [ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modelingβ908Updated 4 months ago
- Pretraining and inference code for a large-scale depth-recurrent language modelβ826Updated last week
- Build high-performance AI models with modular building blocksβ550Updated last week
- [ICLR2025 Spotlightπ₯] Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parametersβ570Updated 7 months ago
- H-Net: Hierarchical Network with Dynamic Chunkingβ713Updated last month
- A simple and efficient Mamba implementation in pure PyTorch and MLX.β1,317Updated 9 months ago
- Dream 7B, a large diffusion language modelβ959Updated 3 weeks ago
- Muon is Scalable for LLM Trainingβ1,302Updated last month
- Training Large Language Model to Reason in a Continuous Latent Spaceβ1,259Updated last month
- Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computationβ435Updated last month
- The simplest, fastest repository for training/finetuning small-sized VLMs.β4,026Updated this week
- Implementing DeepSeek R1's GRPO algorithm from scratchβ1,561Updated 4 months ago
- Official JAX implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden Statesβ421Updated last year
- Code release for DynamicTanh (DyT)β1,010Updated 5 months ago
- NanoGPT (124M) in 3 minutesβ3,091Updated last month
- Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Modelsβ807Updated 2 months ago
- Schedule-Free Optimization in PyTorchβ2,206Updated 3 months ago
- PyTorch code and models for VJEPA2 self-supervised learning from video.β2,177Updated 2 weeks ago
- Collection of papers on state-space modelsβ599Updated last week
- Recipes to scale inference-time compute of open modelsβ1,111Updated 3 months ago
- GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projectionβ1,603Updated 10 months ago
- Helpful tools and examples for working with flex-attentionβ970Updated this week