tsinghua-ideal / TwilightLinks
Adaptive Attention Sparsity with Hierarchical Top-p Pruning
☆19Updated 6 months ago
Alternatives and similar repositories for Twilight
Users that are interested in Twilight are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention☆47Updated 3 weeks ago
- [COLM 2024] SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models☆24Updated 10 months ago
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)