Dao-AILab/fast-hadamard-transform

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Dao-AILab/fast-hadamard-transform)

Dao-AILab / fast-hadamard-transform

Fast Hadamard transform in CUDA, with a PyTorch interface

☆342

Alternatives and similar repositories for fast-hadamard-transform

Users that are interested in fast-hadamard-transform are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

spcl / QuaRot
View on GitHub
Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.
☆524Nov 26, 2024Updated last year
facebookresearch / SpinQuant
View on GitHub
Code repo for the paper "SpinQuant LLM quantization with learned rotations"
☆418Feb 14, 2025Updated last year
IST-DASLab / FP-Quant
View on GitHub
☆116Feb 26, 2026Updated 5 months ago
Qualcomm-AI-research / gptvq
View on GitHub
☆42Mar 28, 2024Updated 2 years ago
ChenMnZ / PrefixQuant
View on GitHub
An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantization
☆176Nov 26, 2025Updated 8 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
IST-DASLab / qutlass
View on GitHub
QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning
☆193Jul 20, 2026Updated last week
TiledTensor / TiledLower
View on GitHub
TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.
☆13Nov 23, 2024Updated last year
mit-han-lab / fouroversix
View on GitHub
Code for the papers: “Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling” and “Adaptive Block-Scaled Data Types”
☆199Apr 21, 2026Updated 3 months ago
ruikangliu / FlatQuant
View on GitHub
[ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"
☆223Nov 25, 2025Updated 8 months ago
Intelligent-Computing-Lab-Panda / GPTAQ
View on GitHub
Code implementation of GPTAQ (https://arxiv.org/abs/2504.02692)
☆93Jul 28, 2025Updated last year
ChenMnZ / INT_vs_FP
View on GitHub
[ICML 2026]A framework to compare low-bit integer and float-point formats
☆81May 6, 2026Updated 2 months ago
Cornell-RelaxML / quip-sharp
View on GitHub
☆605Oct 29, 2024Updated last year
IST-DASLab / marlin
View on GitHub
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
☆1,111Sep 4, 2024Updated last year
IST-DASLab / Quartet
View on GitHub
☆127Mar 18, 2026Updated 4 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Dao-AILab / quack
View on GitHub
A Quirky Assortment of CuTe Kernels
☆1,076Updated this week
ByteDance-Seed / decoupleQ
View on GitHub
A quantization algorithm for LLM
☆151Jun 21, 2024Updated 2 years ago
BrotherHappy / OSTQuant
View on GitHub
[ICLR2025]: OSTQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitt…
☆94Apr 8, 2025Updated last year
Dao-AILab / sonic-moe
View on GitHub
Accelerating MoE with IO and Tile-aware Optimizations
☆732Jul 4, 2026Updated 3 weeks ago
mit-han-lab / omniserve
View on GitHub
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Se…
☆852Mar 6, 2025Updated last year
Cornell-RelaxML / qtip
View on GitHub
☆181Jun 22, 2025Updated last year
ModelTC / LightCompress
View on GitHub
[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.
☆736May 14, 2026Updated 2 months ago
HanGuo97 / flute
View on GitHub
Fast Matrix Multiplications for Lookup Table-Quantized LLMs
☆391Apr 13, 2025Updated last year
mit-han-lab / smoothquant
View on GitHub
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
☆1,674Jul 12, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
efeslab / Atom
View on GitHub
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
☆344Jul 2, 2024Updated 2 years ago
OpenGVLab / EfficientQAT
View on GitHub
[ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
☆342Apr 10, 2026Updated 3 months ago
jy-yuan / KIVI
View on GitHub
[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
☆423Nov 20, 2025Updated 8 months ago
JingyangXiang / DFRot
View on GitHub
[COLM 2025] DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation; 知乎：https://zhuanlan.zhihu.c…
☆30Mar 5, 2025Updated last year
microsoft / BitBLAS
View on GitHub
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
☆769Aug 6, 2025Updated 11 months ago
SNU-ARC / any-precision-llm
View on GitHub
[ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
☆130Jul 4, 2025Updated last year
pprp / Awesome-LLM-Quantization
View on GitHub
Awesome list for LLM quantization
☆436Apr 20, 2026Updated 3 months ago
microsoft / VPTQ
View on GitHub
VPTQ, A Flexible and Extreme low-bit quantization algorithm
☆682Apr 25, 2025Updated last year
zhuzilin / flash-attention-with-sink
View on GitHub
☆37Aug 7, 2025Updated 11 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
facebookresearch / ParetoQ
View on GitHub
This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"
☆131Oct 15, 2025Updated 9 months ago
HazyResearch / ThunderKittens
View on GitHub
Tile primitives for speedy kernels
☆3,566Jul 13, 2026Updated 2 weeks ago
HandH1998 / QQQ
View on GitHub
QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.
☆158Aug 21, 2025Updated 11 months ago
FibonaccciYan / Adamas
View on GitHub
Adamas: Hadamard Sparse Attention for Efficient Long-context Inference
☆15May 19, 2026Updated 2 months ago
IST-DASLab / gptq
View on GitHub
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
☆2,342Mar 27, 2024Updated 2 years ago
dropbox / gemlite
View on GitHub
Fast low-bit matmul kernels in Triton
☆477Jul 15, 2026Updated 2 weeks ago
OpenGVLab / OmniQuant
View on GitHub
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
☆903Nov 26, 2025Updated 8 months ago