MAGICS-LAB / OutEffHopLinks
[ICML 2024] Outlier-Efficient Hopfield Layers for Large Transformer-Based Models
☆21Updated 4 months ago
Alternatives and similar repositories for OutEffHop
Users that are interested in OutEffHop are comparing it to the libraries listed below
Sorting:
- Fast and memory-efficient exact attention☆69Updated 6 months ago
- An extention to the GaLore paper, to perform Natural Gradient Descent in low rank subspace☆17Updated 10 months ago
- SQUEEZED ATTENTION: Accelerating Long Prompt LLM Inference☆52Updated 9 months ago
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆111Updated 10 months ago
- ☆82Updated last year
- ☆141Updated 6 months ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆85Updated last month
- ☆149Updated 2 months ago
- Transformers components but in Triton☆34Updated 3 months ago
- The evaluation framework for training-free sparse attention in LLMs☆91Updated 2 months ago
- QuIP quantization☆58Updated last year
- An efficient implementation of the NSA (Native Sparse Attention) kernel☆114Updated 2 months ago
- Activation-aware Singular Value Decomposition for Compressing Large Language Models☆76Updated 10 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆129Updated 9 months ago
- Sirius, an efficient correction mechanism, which significantly boosts Contextual Sparsity models on reasoning tasks while maintaining its…☆22Updated 11 months ago
- [EMNLP 2024] Quantize LLM to extremely low-bit, and finetune the quantized LLMs☆14Updated last year
- Experiments on Multi-Head Latent Attention☆95Updated last year
- [ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projection☆135Updated 6 months ago
- ☆15Updated 5 months ago
- Code repository for ICLR 2025 paper "LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid"☆20Updated 6 months ago
- ☆53Updated 10 months ago
- RADLADS training code☆27Updated 3 months ago
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆165Updated last year
- Code for studying the super weight in LLM☆117Updated 9 months ago
- The official repository of Quamba1 [ICLR 2025] & Quamba2 [ICML 2025]☆59Updated 2 months ago
- [ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs☆224Updated 7 months ago
- [ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring☆224Updated last month
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration☆226Updated 9 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆155Updated 10 months ago
- 16-fold memory access reduction with nearly no loss☆104Updated 5 months ago