Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆253Jan 31, 2025Updated last year
Alternatives and similar repositories for lolcats
Users that are interested in lolcats are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆68Jul 8, 2025Updated 8 months ago
- Code for the paper: https://arxiv.org/pdf/2309.06979.pdf☆21Jul 29, 2024Updated last year
- ☆14Nov 20, 2022Updated 3 years ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆35Jun 12, 2024Updated last year
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆250Jun 6, 2025Updated 9 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- 🚀 Efficient implementations of state-of-the-art linear attention models☆4,692Updated this week
- Understand and test language model architectures on synthetic tasks.☆263Mar 22, 2026Updated last week
- train with kittens!☆64Oct 25, 2024Updated last year
- ☆126Feb 4, 2026Updated last month
- Tile primitives for speedy kernels☆3,244Mar 17, 2026Updated last week
- 🔥 A minimal training framework for scaling FLA models☆358Nov 15, 2025Updated 4 months ago
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- Offical implementation of "MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map" (NeurIPS2024 Oral)☆35Jan 18, 2025Updated last year
- ☆58Jul 9, 2024Updated last year
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆239Oct 14, 2025Updated 5 months ago
- A repository for research on medium sized language models.☆78May 23, 2024Updated last year
- HGRN2: Gated Linear RNNs with State Expansion☆56Aug 20, 2024Updated last year
- Some preliminary explorations of Mamba's context scaling.☆218Feb 8, 2024Updated 2 years ago
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads☆532Feb 10, 2025Updated last year
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆110Oct 11, 2025Updated 5 months ago
- Make triton easier☆50Jun 12, 2024Updated last year
- [COLM'25] A Controlled Study on Long Context Extension and Generalization in LLMs☆64Mar 9, 2026Updated 2 weeks ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆375Dec 12, 2024Updated last year
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"☆170Jan 30, 2025Updated last year
- PyTorch implementation of models from the Zamba2 series.☆189Jan 23, 2025Updated last year
- Awesome Triton Resources☆39Apr 27, 2025Updated 11 months ago
- 🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"☆978Feb 5, 2026Updated last month
- The Structure and Interpretation of Deep Networks Handbook☆14Dec 14, 2024Updated last year
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆18Mar 15, 2024Updated 2 years ago
- ☆166Jun 22, 2025Updated 9 months ago
- ☆20May 30, 2024Updated last year
- ☆29Jul 9, 2024Updated last year
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- Training hybrid models for dummies.☆29Nov 1, 2025Updated 4 months ago
- ☆133Jun 6, 2025Updated 9 months ago
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule☆524Mar 13, 2026Updated 2 weeks ago
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆157Apr 7, 2025Updated 11 months ago
- Helpful tools and examples for working with flex-attention☆1,161Feb 8, 2026Updated last month
- Official repository of Sparse ISO-FLOP Transformations for Maximizing Training Efficiency☆25Jul 31, 2024Updated last year