gmlwns2000 / sea-attentionLinks
Official Implementation of SEA: Sparse Linear Attention with Estimated Attention Mask (ICLR 2024)
☆11Updated 5 months ago
Alternatives and similar repositories for sea-attention
Users that are interested in sea-attention are comparing it to the libraries listed below
Sorting:
- ☆24Updated last year
- ☆30Updated last year
- ☆62Updated 2 years ago
- ☆53Updated last year
- [ACL 2025] Squeezed Attention: Accelerating Long Prompt LLM Inference☆54Updated last year
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization☆38Updated last year
- AdaSplash: Adaptive Sparse Flash Attention (aka Flash Entmax Attention)☆30Updated 2 months ago
- ☆16Updated last year
- LLM Inference with Microscaling Format☆33Updated last year
- Residual vector quantization for KV cache compression in large language model☆10Updated last year
- Code repo for the paper BiT Robustly Binarized Multi-distilled Transformer☆114Updated 2 years ago
- [NeurIPS 2023] ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer☆30Updated 2 years ago
- DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training (ICLR 2023)☆31Updated 2 years ago
- [ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…☆67Updated last year
- [COLM 2025] Official PyTorch implementation of "Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models"☆61Updated 4 months ago
- ☆31Updated last year
- ☆60Updated last year
- ☆45Updated last year
- ☆26Updated last week
- ☆21Updated 2 years ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆35Updated last year
- ☆21Updated last year
- EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation☆27Updated 4 months ago
- Triton implement of bi-directional (non-causal) linear attention☆56Updated 10 months ago
- This project is the official implementation of our accepted ICLR 2022 paper BiBERT: Accurate Fully Binarized BERT.☆89Updated 2 years ago
- Beyond KV Caching: Shared Attention for Efficient LLMs☆20Updated last year
- Official PyTorch implementation of CD-MOE☆12Updated 8 months ago
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆30Updated last year
- The official repository of Quamba1 [ICLR 2025] & Quamba2 [ICML 2025]☆62Updated 5 months ago
- Activation-aware Singular Value Decomposition for Compressing Large Language Models☆80Updated last year