Token Omission Via Attention
☆127Oct 13, 2024Updated last year
Alternatives and similar repositories for TOVA
Users that are interested in TOVA are comparing it to the libraries listed below
Sorting:
- Linear Attention Sequence Parallelism (LASP)☆88Jun 4, 2024Updated last year
- ☆52May 13, 2024Updated last year
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆18Mar 15, 2024Updated last year
- Easy control for Key-Value Constrained Generative LLM Inference(https://arxiv.org/abs/2402.06262)☆62Feb 13, 2024Updated 2 years ago
- A repository for research on medium sized language models.☆78May 23, 2024Updated last year
- [ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning☆665Jun 1, 2024Updated last year
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆280Nov 3, 2023Updated 2 years ago
- Implementation of Spectral State Space Models☆16Feb 23, 2024Updated 2 years ago
- [ICML‘2024] "LoCoCo: Dropping In Convolutions for Long Context Compression", Ruisi Cai, Yuandong Tian, Zhangyang Wang, Beidi Chen☆17Sep 7, 2024Updated last year
- [ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache☆358Nov 20, 2025Updated 3 months ago
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆81Jan 18, 2024Updated 2 years ago
- ☆352Apr 2, 2024Updated last year
- Yet another frontend for LLM, written using .NET and WinUI 3☆10Sep 14, 2025Updated 5 months ago
- ☆13May 21, 2023Updated 2 years ago
- ☆301Jul 10, 2025Updated 7 months ago
- ☆84Nov 10, 2025Updated 3 months ago
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality☆231Aug 2, 2024Updated last year
- [NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.☆503Aug 1, 2024Updated last year
- [ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"☆446Oct 16, 2024Updated last year
- [NeurIPS 2024] Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study☆59Nov 24, 2024Updated last year
- ☆13Jan 15, 2025Updated last year
- Source code of “Reinforcement Learning with Token-level Feedback for Controllable Text Generation (NAACL 2024)☆17Dec 8, 2024Updated last year
- ☆159Feb 15, 2025Updated last year
- Code for the PAPA paper☆27Nov 8, 2022Updated 3 years ago
- Official implementation of the paper "Linear Transformers with Learnable Kernel Functions are Better In-Context Models"☆169Jan 16, 2025Updated last year
- ☆32Jan 1, 2024Updated 2 years ago
- Fast and differentiable hidden Markov model in C++☆19Jan 20, 2023Updated 3 years ago
- ☆201Dec 4, 2023Updated 2 years ago
- Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023☆138Apr 30, 2024Updated last year
- ☆235Jun 11, 2024Updated last year
- ☆40Jan 5, 2024Updated 2 years ago
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Apr 17, 2024Updated last year
- Keyformer proposes KV Cache reduction through key tokens identification and without the need for fine-tuning☆57Mar 26, 2024Updated last year
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆67Apr 24, 2024Updated last year
- An extention to the GaLore paper, to perform Natural Gradient Descent in low rank subspace☆18Oct 21, 2024Updated last year
- Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)☆63Apr 18, 2024Updated last year
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads☆524Feb 10, 2025Updated last year
- ☆36Oct 10, 2024Updated last year
- Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inference☆47Jun 19, 2024Updated last year