ai-in-pm / Titans---Learning-to-Memorize-at-Test-TimeLinks
Titans - Learning to Memorize at Test Time
☆55Updated 11 months ago
Alternatives and similar repositories for Titans---Learning-to-Memorize-at-Test-Time
Users that are interested in Titans---Learning-to-Memorize-at-Test-Time are comparing it to the libraries listed below
Sorting:
- Inference Speed Benchmark for Learning to (Learn at Test Time): RNNs with Expressive Hidden States☆78Updated last year
- PyTorch implementation of Titans.☆31Updated 11 months ago
- Just a repository that will house some MLPs and their variants, so to avoid having to reimplement them again and again for different proj…☆44Updated last week
- Geometric-Mean Policy Optimization☆96Updated last month
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆35Updated last year
- Official implementation of RMoE (Layerwise Recurrent Router for Mixture-of-Experts)☆28Updated last year
- [ICLR 2025 Spotlight] Official Implementation for ToST (Token Statistics Transformer)☆128Updated 10 months ago
- The official repo of continuous speculative decoding☆31Updated 9 months ago
- ☆68Updated last year
- Code for "RSQ: Learning from Important Tokens Leads to Better Quantized LLMs"☆20Updated 7 months ago
- M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models☆47Updated 5 months ago
- [ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…☆67Updated last year
- This repo contains the source code for VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks (NeurIPS 2024).☆42Updated last year
- Triton implement of bi-directional (non-causal) linear attention☆60Updated 11 months ago
- [NeurIPS'25] dKV-Cache: The Cache for Diffusion Language Models☆128Updated 7 months ago
- FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation☆50Updated 4 months ago
- [ICLR 2025] Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization☆20Updated 3 months ago
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆17Updated 10 months ago
- This project implements the Titans architecture from the paper "Titans: Learning to Memorize at Test Time" for market data prediction.☆11Updated 11 months ago
- Defeating the Training-Inference Mismatch via FP16☆172Updated last month
- implementations and experimentation on mHC by deepseek - https://arxiv.org/abs/2512.24880☆202Updated last week
- Implementation of a transformer for reinforcement learning using `x-transformers`☆72Updated 3 months ago
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"☆57Updated last year
- [ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning☆134Updated 3 weeks ago
- [ICML 2025 Oral] Mixture of Lookup Experts☆61Updated last month
- MobileLLM-R1☆72Updated 3 months ago
- The official github repo for "Diffusion Language Models are Super Data Learners".☆215Updated 2 months ago
- Pytorch Implementation of the sparse attention from the paper: "Generating Long Sequences with Sparse Transformers"☆93Updated 2 months ago
- ☆51Updated 8 months ago
- [ICML24] Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs☆98Updated last year