lucidrains / titans-pytorchLinks
Unofficial implementation of Titans, SOTA memory for transformers, in Pytorch
β1,439Updated 2 months ago
Alternatives and similar repositories for titans-pytorch
Users that are interested in titans-pytorch are comparing it to the libraries listed below
Sorting:
- A Self-adaptation Frameworkπ that adapts LLMs for unseen tasks in real-time!β1,136Updated 6 months ago
- Code for BLT research paperβ1,958Updated 3 months ago
- Official PyTorch implementation for "Large Language Diffusion Models"β2,763Updated this week
- Muon is an optimizer for hidden layers in neural networksβ1,547Updated last month
- Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paperβ725Updated last week
- π Efficient implementations of state-of-the-art linear attention modelsβ3,045Updated this week
- Continuous Thought Machines, because thought takes time and reasoning is a process.β1,261Updated last month
- Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden Statesβ1,241Updated last year
- [ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modelingβ904Updated 3 months ago
- Dream 7B, a large diffusion language modelβ915Updated this week
- [ICLR2025 Spotlightπ₯] Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parametersβ569Updated 6 months ago
- Pretraining and inference code for a large-scale depth-recurrent language modelβ816Updated last month
- H-Net: Hierarchical Network with Dynamic Chunkingβ657Updated 3 weeks ago
- A simple and efficient Mamba implementation in pure PyTorch and MLX.β1,309Updated 8 months ago
- Training Large Language Model to Reason in a Continuous Latent Spaceβ1,249Updated last week
- Muon is Scalable for LLM Trainingβ1,281Updated 3 weeks ago
- Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Modelsβ782Updated last month
- PyTorch code and models for VJEPA2 self-supervised learning from video.β2,039Updated last week
- Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computationβ405Updated 2 weeks ago
- Code release for DynamicTanh (DyT)β1,004Updated 4 months ago
- Large Concept Models: Language modeling in a sentence representation spaceβ2,261Updated 6 months ago
- Implementing DeepSeek R1's GRPO algorithm from scratchβ1,537Updated 4 months ago
- SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.β802Updated 3 weeks ago
- MMaDA - Open-Sourced Multimodal Large Diffusion Language Modelsβ1,312Updated last week
- Build high-performance AI models with modular building blocksβ541Updated this week
- MoBA: Mixture of Block Attention for Long-Context LLMsβ1,870Updated 4 months ago
- Minimalistic 4D-parallelism distributed training framework for education purposeβ1,673Updated last month
- Recipes to scale inference-time compute of open modelsβ1,112Updated 3 months ago
- An Open-source RL System from ByteDance Seed and Tsinghua AIRβ1,518Updated 3 months ago
- The simplest, fastest repository for training/finetuning small-sized VLMs.β3,907Updated last week