yifu-ding / BGEMM-CUDA
This is a repository of Binary General Matrix Multiply (BGEMM) by customized CUDA kernel. Thank FP6-LLM for the wheels!
☆14Updated 5 months ago
Alternatives and similar repositories for BGEMM-CUDA:
Users that are interested in BGEMM-CUDA are comparing it to the libraries listed below
- The official PyTorch implementation of the NeurIPS2022 (spotlight) paper, Outlier Suppression: Pushing the Limit of Low-bit Transformer L…☆48Updated 2 years ago
- LLM Inference with Microscaling Format☆17Updated 3 months ago
- ☆89Updated last year
- ☆20Updated 11 months ago
- DeiT implementation for Q-ViT☆24Updated 2 years ago
- [HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning