Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆256Jan 31, 2025Updated last year
Alternatives and similar repositories for lolcats
Users that are interested in lolcats are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆70Jul 8, 2025Updated 10 months ago
- Code for the paper: https://arxiv.org/pdf/2309.06979.pdf☆21Jul 29, 2024Updated last year
- ☆14Nov 20, 2022Updated 3 years ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆35Jun 12, 2024Updated last year
- 🚀 Efficient implementations for emerging model architectures☆5,032May 1, 2026Updated last week
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆252Jun 6, 2025Updated 11 months ago
- Understand and test language model architectures on synthetic tasks.☆265Mar 22, 2026Updated last month
- train with kittens!☆64Oct 25, 2024Updated last year
- Tile primitives for speedy kernels☆3,336Apr 29, 2026Updated last week
- ☆130Feb 4, 2026Updated 3 months ago
- 🔥 A minimal training framework for scaling FLA models☆385Apr 22, 2026Updated 2 weeks ago
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- Offical implementation of "MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map" (NeurIPS2024 Oral)☆36Jan 18, 2025Updated last year
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆241Oct 14, 2025Updated 6 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- A repository for research on medium sized language models.☆78May 23, 2024Updated last year
- HGRN2: Gated Linear RNNs with State Expansion☆57Aug 20, 2024Updated last year
- ☆59Jul 9, 2024Updated last year
- Some preliminary explorations of Mamba's context scaling.☆219Feb 8, 2024Updated 2 years ago
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆25Jun 6, 2024Updated last year
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads☆543Feb 10, 2025Updated last year
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆112Oct 11, 2025Updated 6 months ago
- Make triton easier☆50Jun 12, 2024Updated last year
- [COLM'25] A Controlled Study on Long Context Extension and Generalization in LLMs☆65Mar 9, 2026Updated 2 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆376Dec 12, 2024Updated last year
- Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"☆171Jan 30, 2025Updated last year
- PyTorch implementation of models from the Zamba2 series.☆194Jan 23, 2025Updated last year
- Awesome Triton Resources☆41Apr 27, 2025Updated last year
- The Structure and Interpretation of Deep Networks Handbook☆14Dec 14, 2024Updated last year
- 🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"☆995Feb 5, 2026Updated 3 months ago
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆18Mar 15, 2024Updated 2 years ago
- ☆169Jun 22, 2025Updated 10 months ago
- ☆20May 30, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆29Jul 9, 2024Updated last year
- Training hybrid models for dummies.☆29Nov 1, 2025Updated 6 months ago
- ☆136Jun 6, 2025Updated 11 months ago
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule☆558Mar 13, 2026Updated last month
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆156Apr 7, 2025Updated last year
- Helpful tools and examples for working with flex-attention☆1,182Apr 13, 2026Updated 3 weeks ago