lucidrains / titans-pytorchLinks
Unofficial implementation of Titans, SOTA memory for transformers, in Pytorch
β1,451Updated 3 months ago
Alternatives and similar repositories for titans-pytorch
Users that are interested in titans-pytorch are comparing it to the libraries listed below
Sorting:
- Code for BLT research paperβ1,980Updated 3 months ago
- A Self-adaptation Frameworkπ that adapts LLMs for unseen tasks in real-time!β1,141Updated 7 months ago
- Muon is an optimizer for hidden layers in neural networksβ1,672Updated 2 months ago
- Continuous Thought Machines, because thought takes time and reasoning is a process.β1,286Updated last month
- Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden Statesβ1,248Updated last year
- Official PyTorch implementation for "Large Language Diffusion Models"β2,864Updated this week
- Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paperβ740Updated 3 weeks ago
- [ICLR2025 Spotlightπ₯] Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parametersβ570Updated 7 months ago
- H-Net: Hierarchical Network with Dynamic Chunkingβ702Updated last month
- Pretraining and inference code for a large-scale depth-recurrent language modelβ826Updated last week
- Implementing DeepSeek R1's GRPO algorithm from scratchβ1,561Updated 4 months ago
- [ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modelingβ908Updated 4 months ago
- A simple and efficient Mamba implementation in pure PyTorch and MLX.β1,318Updated 9 months ago
- MMaDA - Open-Sourced Multimodal Large Diffusion Language Modelsβ1,341Updated 3 weeks ago
- π Efficient implementations of state-of-the-art linear attention modelsβ3,110Updated this week
- Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computationβ435Updated last month
- Dream 7B, a large diffusion language modelβ959Updated 3 weeks ago
- Muon is Scalable for LLM Trainingβ1,302Updated last month
- Build high-performance AI models with modular building blocksβ548Updated last week
- A suite of image and video neural tokenizersβ1,668Updated 7 months ago
- Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Modelsβ807Updated 2 months ago
- Code release for DynamicTanh (DyT)β1,010Updated 5 months ago
- Recipes to scale inference-time compute of open modelsβ1,111Updated 3 months ago
- Training Large Language Model to Reason in a Continuous Latent Spaceβ1,259Updated last month
- Official JAX implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden Statesβ421Updated last year
- NanoGPT (124M) in 3 minutesβ3,091Updated last month
- PyTorch code and models for VJEPA2 self-supervised learning from video.β2,177Updated 2 weeks ago
- Large Concept Models: Language modeling in a sentence representation spaceβ2,277Updated 7 months ago
- [ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptationβ848Updated 11 months ago
- Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAIβ1,201Updated 2 months ago