lucidrains / titans-pytorchLinks
Unofficial implementation of Titans, SOTA memory for transformers, in Pytorch
β1,887Updated last week
Alternatives and similar repositories for titans-pytorch
Users that are interested in titans-pytorch are comparing it to the libraries listed below
Sorting:
- Muon is an optimizer for hidden layers in neural networksβ2,201Updated last month
- A Self-adaptation Frameworkπ that adapts LLMs for unseen tasks in real-time!β1,179Updated 11 months ago
- Code for BLT research paperβ2,024Updated 2 months ago
- Continuous Thought Machines, because thought takes time and reasoning is a process.β1,712Updated 2 weeks ago
- Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden Statesβ1,307Updated last year
- Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paperβ791Updated 5 months ago
- Official PyTorch implementation for "Large Language Diffusion Models"β3,473Updated 2 months ago
- π Efficient implementations of state-of-the-art linear attention modelsβ4,243Updated this week
- [ICLR2025 Spotlightπ₯] Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parametersβ581Updated 11 months ago
- Pretraining and inference code for a large-scale depth-recurrent language modelβ859Updated 2 weeks ago
- [ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modelingβ939Updated 2 months ago
- H-Net: Hierarchical Network with Dynamic Chunkingβ801Updated last month
- Training Large Language Model to Reason in a Continuous Latent Spaceβ1,449Updated 5 months ago
- Muon is Scalable for LLM Trainingβ1,397Updated 5 months ago
- Implementing DeepSeek R1's GRPO algorithm from scratchβ1,740Updated 8 months ago
- Dream 7B, a large diffusion language modelβ1,139Updated last month
- Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation (NeurIPS 2025)β530Updated 3 months ago
- MMaDA - Open-Sourced Multimodal Large Diffusion Language Modelsβ1,556Updated 2 months ago
- A simple and efficient Mamba implementation in pure PyTorch and MLX.β1,405Updated last year
- dLLM: Simple Diffusion Language Modelingβ1,566Updated last week
- Official JAX implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden Statesβ446Updated 2 months ago
- Build high-performance AI models with modular building blocksβ576Updated this week
- [ICLR 2025 Oral] Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Modelsβ940Updated 6 months ago
- OLMoE: Open Mixture-of-Experts Language Modelsβ950Updated 3 months ago
- π³ Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"β952Updated 9 months ago
- Large Concept Models: Language modeling in a sentence representation spaceβ2,327Updated 11 months ago
- β658Updated 9 months ago
- Recipes to scale inference-time compute of open modelsβ1,123Updated 7 months ago
- Helpful tools and examples for working with flex-attentionβ1,108Updated this week
- PyTorch code and models for VJEPA2 self-supervised learning from video.β2,759Updated 4 months ago