chenyaofo / CCA-AttentionLinks
☆19Updated 4 months ago
Alternatives and similar repositories for CCA-Attention
Users that are interested in CCA-Attention are comparing it to the libraries listed below
Sorting:
- ☆49Updated 6 months ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆39Updated last year
- The code for "AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference", Qingyue Yang, Jie Wang, Xing Li, Zhihai Wang, Ch…☆24Updated 5 months ago
- The official implementation for MTLoRA: A Low-Rank Adaptation Approach for Efficient Multi-Task Learning (CVPR '24)☆69Updated 5 months ago
- Offical implementation of "MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map" (NeurIPS2024 Oral)☆32Updated 11 months ago
- ☆26Updated 3 weeks ago
- ☆17Updated 4 months ago
- ☆31Updated 6 months ago
- Codes for Merging Large Language Models☆34Updated last year
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)☆31Updated 8 months ago
- The official GitHub page for the survey paper "A Survey of RWKV".☆29Updated 11 months ago
- Code for the EMNLP24 paper "A simple and effective L2 norm based method for KV Cache compression."☆17Updated last year
- Flash-Linear-Attention models beyond language☆20Updated 3 months ago
- ☆18Updated 9 months ago
- ICLR 2025☆30Updated 7 months ago
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆56Updated 6 months ago
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆111Updated 2 weeks ago
- A repository for DenseSSMs☆89Updated last year
- [ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration☆61Updated 10 months ago
- User-friendly implementation of the Mixture-of-Sparse-Attention (MoSA). MoSA selects distinct tokens for each head with expert choice rou…☆28Updated 7 months ago
- ☆62Updated 5 months ago
- Research work aimed at addressing the problem of modeling infinite-length context☆29Updated this week
- ☆152Updated last year
- [NAACL 24 Oral] LoRETTA: Low-Rank Economic Tensor-Train Adaptation for Ultra-Low-Parameter Fine-Tuning of Large Language Models☆39Updated 11 months ago
- PyTorch implementation of StableMask (ICML'24)☆14Updated last year
- ☆38Updated 4 months ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆35Updated last year
- [ACL 2025] Squeezed Attention: Accelerating Long Prompt LLM Inference☆54Updated last year
- ☆112Updated 3 months ago
- a training-free approach to accelerate ViTs and VLMs by pruning redundant tokens based on similarity☆40Updated 6 months ago