PingchengDong / GQA-LUT
View external linksLinks

The official implementation of the DAC 2024 paper GQA-LUT

☆20

Alternatives and similar repositories for GQA-LUT

Users that are interested in GQA-LUT are comparing it to the libraries listed below

Sorting:

HuangOwen / RoLoRA
View on GitHub
[EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization
☆37Sep 24, 2024Updated last year
nbasyl / OFQ
View on GitHub
The official implementation of the ICML 2023 paper OFQ-ViT
☆38Oct 3, 2023Updated 2 years ago
HuangOwen / Quantization-Variation
View on GitHub
[TMLR] Official PyTorch implementation of paper "Quantization Variation: A New Perspective on Training Transformers with Low-Bit Precisio…
☆48Sep 27, 2024Updated last year
Lizn-zn / Nesy-Programming
View on GitHub
☆10Oct 28, 2024Updated last year
HuangOwen / QAT-ACS
View on GitHub
[TMLR] Official PyTorch implementation of paper "Efficient Quantization-aware Training with Adaptive Coreset Selection"
☆37Aug 20, 2024Updated last year
schwartz-lab-NLP / Tokens2Words
View on GitHub
☆15Apr 2, 2025Updated 10 months ago
TUDa-HWAI / Basis_Sharing
View on GitHub
☆18Oct 2, 2024Updated last year
GATECH-EIC / ShiftAddViT
View on GitHub
[NeurIPS 2023] ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer
☆30Dec 6, 2023Updated 2 years ago
lutnn / blink-mm
View on GitHub
☆16Jul 24, 2023Updated 2 years ago
IST-DASLab / QuEST
View on GitHub
Work in progress.
☆79Nov 25, 2025Updated 2 months ago
IST-DASLab / gemm-fp8
View on GitHub
High Performance FP8 GEMM Kernels for SM89 and later GPUs.
☆20Jan 24, 2025Updated last year
chihhuiho / yoro
View on GitHub
☆16Nov 14, 2022Updated 3 years ago
alessiodevoto / l2compress
View on GitHub
Code for the EMNLP24 paper "A simple and effective L2 norm based method for KV Cache compression."
☆18Dec 13, 2024Updated last year
kst179 / fused-attention
View on GitHub
Fast and low-memory attention layer written in CUDA
☆20Jul 14, 2023Updated 2 years ago
hyintell / LLMSymbolic
View on GitHub
☆22Feb 29, 2024Updated last year
parsa-epfl / quantization-sparsity-interplay
View on GitHub
This repo contains the code for studying the interplay between quantization and sparsity methods
☆26Feb 26, 2025Updated 11 months ago
lliai / D2MoE
View on GitHub
D^2-MoE: Delta Decompression for MoE-based LLMs Compression
☆72Mar 25, 2025Updated 10 months ago
jxiw / MambaByte
View on GitHub
[CoLM 24] Official Repository of MambaByte: Token-free Selective State Space Model
☆24Oct 12, 2024Updated last year
zhangsichengsjtu / AFPQ
View on GitHub
AFPQ code implementation
☆23Nov 6, 2023Updated 2 years ago
INV-WZQ / SparseD
View on GitHub
[ICLR 2026] SparseD: Sparse Attention for Diffusion Language Models
☆57Oct 7, 2025Updated 4 months ago
pilancilab / caldera
View on GitHub
Compressing Large Language Models using Low Precision and Low Rank Decomposition
☆106Nov 24, 2025Updated 2 months ago
yunlong10 / VidComposition
View on GitHub
[CVPR 2025] VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
☆29May 10, 2025Updated 9 months ago
yxli2123 / LoftQ
View on GitHub
☆235Jun 11, 2024Updated last year
AIoT-MLSys-Lab / SVD-LLM
View on GitHub
[ICLR 2025🔥] SVD-LLM & [NAACL 2025🔥] SVD-LLM V2
☆281Aug 28, 2025Updated 5 months ago
sjtu-zhao-lab / SALO
View on GitHub
An efficient spatial accelerator enabling hybrid sparse attention mechanisms for long sequences
☆31Mar 7, 2024Updated last year
lifuguan / GP-NeRF
View on GitHub
[CVPR 2024 Highlight] GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding
☆27Jul 26, 2024Updated last year
hyunwoo137 / EDAFormer
View on GitHub
Official Pytorch implementations for "Embedding-Free Transformer with Inference Spatial Reduction for Efficient Semantic Segmentation"(EC…
☆33Mar 15, 2025Updated 11 months ago
GATECH-EIC / Linearized-LLM
View on GitHub
[ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
☆35Jun 12, 2024Updated last year
OpenBitSys / BitDistiller
View on GitHub
[ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.
☆134May 16, 2024Updated last year
GATECH-EIC / ViTCoD
View on GitHub
[HPCA 2023] ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design
☆127Jun 27, 2023Updated 2 years ago
UCSB-NLP-Chang / ThinkPrune
View on GitHub
☆46Sep 27, 2025Updated 4 months ago
bryanchrist / llama2-70b
View on GitHub
Codebase for fine-tuning Llama2 70B to generate math test questions and answers.
☆11Aug 30, 2024Updated last year
Qualcomm-AI-research / gptvq
View on GitHub
☆40Mar 28, 2024Updated last year
ise-uiuc / xft
View on GitHub
XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts
☆35Jul 2, 2024Updated last year
GATECH-EIC / Castling-ViT
View on GitHub
[CVPR 2023] Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference
☆30Mar 14, 2024Updated last year
abdelfattah-lab / xKV
View on GitHub
xKV: Cross-Layer SVD for KV-Cache Compression
☆43Nov 30, 2025Updated 2 months ago
naver-aics / lut-gemm
View on GitHub
☆83Apr 1, 2024Updated last year
thu-nics / qllm-eval
View on GitHub
Code Repository of Evaluating Quantized Large Language Models
☆135Sep 8, 2024Updated last year
nbasyl / LLM-FP4
View on GitHub
The official implementation of the EMNLP 2023 paper LLM-FP4
☆220Dec 15, 2023Updated 2 years ago

PingchengDong / GQA-LUTView external linksLinks

Alternatives and similar repositories for GQA-LUT

PingchengDong / GQA-LUT
View external linksLinks