Repository for CPU Kernel Generation for LLM Inference
☆28Jul 13, 2023Updated 2 years ago
Alternatives and similar repositories for QIGen
Users that are interested in QIGen are comparing it to the libraries listed below
Sorting:
- Prototype routines for GPU quantization written using PyTorch.☆21Feb 8, 2026Updated last month
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)☆145Sep 20, 2024Updated last year
- ☆13Jun 22, 2025Updated 9 months ago
- Official Code For Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM☆14Dec 27, 2023Updated 2 years ago
- [ICML 2025] SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models☆53Aug 9, 2024Updated last year
- [ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization☆713Aug 13, 2024Updated last year
- Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024☆185Apr 16, 2024Updated last year
- Apply Iprompt on GLM with innovative new methods. Currently support Chinese QA, English QA and Chinese poem generation.