Repository for CPU Kernel Generation for LLM Inference
☆28Jul 13, 2023Updated 2 years ago
Alternatives and similar repositories for QIGen
Users that are interested in QIGen are comparing it to the libraries listed below
Sorting:
- ☆13Jun 22, 2025Updated 8 months ago
- Official Code For Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM☆14Dec 27, 2023Updated 2 years ago
- Grams: Gradient Descent with Adaptive Momentum Scaling (ICLR 2025 Workshop)☆17Mar 6, 2025Updated 11 months ago
- Implementation of an active inference capsule☆16Jun 2, 2021Updated 4 years ago
- Reorder-based post-training quantization for large language model☆199May 17, 2023Updated 2 years ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)☆145Sep 20, 2024Updated last year
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)☆81Aug 30, 2023Updated 2 years ago
- Apply Iprompt on GLM with innovative new methods. Currently support Chinese QA, English QA and Chinese poem generation.☆20Jun 16, 2022Updated 3 years ago
- A multi-page application to visualize and predict Covid numbers☆22May 19, 2025Updated 9 months ago
- A TensorFlow Extension: GPU performance tools for TensorFlow.☆26Jul 27, 2023Updated 2 years ago
- ☆19Nov 6, 2023Updated 2 years ago
- [ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization☆713Aug 13, 2024Updated last year
- ☆21Feb 11, 2022Updated 4 years ago
- Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and opti…☆50Oct 21, 2023Updated 2 years ago
- [ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs☆228Jan 11, 2025Updated last year
- python package of rocm-smi-lib☆24Dec 15, 2025Updated 2 months ago
- ☆160Sep 15, 2023Updated 2 years ago
- ☆34Aug 23, 2023Updated 2 years ago
- quick playground to animate pippin☆14Nov 11, 2024Updated last year
- A user-friendly Command & Control (C&C) web platform for remote monitoring, management, and task automation across multiple devices.☆14Dec 15, 2024Updated last year
- ☆31Mar 23, 2024Updated last year
- Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024☆184Apr 16, 2024Updated last year
- ☆34Jun 12, 2025Updated 8 months ago
- FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.☆1,025Sep 4, 2024Updated last year
- ☆120Apr 22, 2024Updated last year
- A red teaming agent☆18Oct 15, 2025Updated 4 months ago
- A simple and effective LLM pruning approach.☆849Aug 9, 2024Updated last year
- ☆553Feb 8, 2026Updated 3 weeks ago
- GPTQ inference Triton kernel☆321May 18, 2023Updated 2 years ago
- ☆85Jan 23, 2025Updated last year
- [ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.☆889Nov 26, 2025Updated 3 months ago
- Bayesian Low-Rank Adaptation for Large Language Models☆36Jun 22, 2024Updated last year
- This repository contains integer operators on GPUs for PyTorch.☆237Sep 29, 2023Updated 2 years ago
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆39Mar 11, 2024Updated last year
- ☆235Jun 11, 2024Updated last year
- ☆33Apr 12, 2021Updated 4 years ago
- ☆14Jan 23, 2026Updated last month
- A nonparametric variational information bottleneck (NVIB) layer in Pytorch☆11Apr 15, 2025Updated 10 months ago
- A Library for Scaling Mixed-Integer Optimization-Based Machine Learning.☆12Jun 24, 2024Updated last year