test-time-training / ttt-lm-kernels
Inference Speed Benchmark for Learning to (Learn at Test Time): RNNs with Expressive Hidden States
☆42Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for ttt-lm-kernels
- [ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…☆59Updated 7 months ago
- ☆98Updated 8 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆49Updated 3 months ago
- A repository for DenseSSMs☆88Updated 7 months ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆44Updated last year
- [ICML'24 Oral] The official code of "DiJiang: Efficient Large Language Models through Compact Kernelization", a novel DCT-based linear at…☆98Updated 5 months ago
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆50Updated last week
- Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"☆28Updated 7 months ago
- [EMNLP 2023 Main] Sparse Low-rank Adaptation of Pre-trained Language Models☆70Updated 8 months ago
- ☆109Updated 4 months ago
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…☆79Updated 2 months ago
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆61Updated 7 months ago
- Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)☆40Updated last week
- ☆45Updated 4 months ago
- An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆33Updated 5 months ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆32Updated last month
- Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs☆74Updated 5 months ago
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆118Updated 4 months ago
- Linear Attention Sequence Parallelism (LASP)☆64Updated 5 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆93Updated last month
- PyTorch implementation of "From Sparse to Soft Mixtures of Experts"☆47Updated last year
- [NAACL 24 Oral] LoRETTA: Low-Rank Economic Tensor-Train Adaptation for Ultra-Low-Parameter Fine-Tuning of Large Language Models☆26Updated 2 months ago
- [ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"☆64Updated 5 months ago
- Low-bit optimizers for PyTorch☆119Updated last year
- ☆77Updated 4 months ago
- Code accompanying the paper "Massive Activations in Large Language Models"☆123Updated 8 months ago
- The official code for Dropping Backward Propagation (DropBP)☆26Updated 3 weeks ago
- [EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformer☆55Updated last year
- The implementation for MLSys 2023 paper: "Cuttlefish: Low-rank Model Training without All The Tuning"☆43Updated last year
- Sparse Backpropagation for Mixture-of-Expert Training☆24Updated 4 months ago