IST-DASLab/QIGen

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/IST-DASLab/QIGen)

IST-DASLab / QIGen

Repository for CPU Kernel Generation for LLM Inference

☆28

Alternatives and similar repositories for QIGen

Users that are interested in QIGen are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ilur98 / DGQ
View on GitHub
Official Code For Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM
☆14Dec 27, 2023Updated 2 years ago
hahnyuan / RPTQ4LLM
View on GitHub
Reorder-based post-training quantization for large language model
☆199May 17, 2023Updated 3 years ago
wuhy68 / Parameter-Efficient-MoE
View on GitHub
Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)
☆145Sep 20, 2024Updated last year
SqueezeAILab / SqueezeLLM
View on GitHub
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
☆722Aug 13, 2024Updated last year
IST-DASLab / QUIK
View on GitHub
Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024
☆185Apr 16, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
THUDM / GLM-iprompt
View on GitHub
Apply Iprompt on GLM with innovative new methods. Currently support Chinese QA, English QA and Chinese poem generation.
☆20Jun 16, 2022Updated 4 years ago
ModelTC / Outlier_Suppression_Plus
View on GitHub
Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and opti…
☆52Oct 21, 2023Updated 2 years ago
Qualcomm-AI-research / BayesianBits
View on GitHub
☆22Feb 11, 2022Updated 4 years ago
AozhongZhang / MagR
View on GitHub
☆16Jun 22, 2025Updated last year
Cornell-RelaxML / QuIP
View on GitHub
Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"
☆399Feb 24, 2024Updated 2 years ago
ysbsb / awesome-quantization
View on GitHub
Awesome Quantization Paper lists with Codes
☆10Feb 24, 2021Updated 5 years ago
Twilight92z / Quantize-Watermark
View on GitHub
☆19Nov 6, 2023Updated 2 years ago
fpgaminer / GPTQ-triton
View on GitHub
GPTQ inference Triton kernel
☆323May 18, 2023Updated 3 years ago
IST-DASLab / SparseFinetuning
View on GitHub
Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry
☆43Jan 15, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
facebookresearch / LLM-QAT
View on GitHub
Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"
☆325Mar 4, 2025Updated last year
papers-submission / structured_transposable_masks
View on GitHub
Code for ICML 2021 submission
☆35Mar 24, 2021Updated 5 years ago
Aaronhuang-778 / BiLLM
View on GitHub
[ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
☆235Jan 11, 2025Updated last year
NVIDIA / atex
View on GitHub
A TensorFlow Extension: GPU performance tools for TensorFlow.
☆26Jul 27, 2023Updated 2 years ago
philschmid / deep-learning-remote-runner
View on GitHub
☆16Aug 10, 2022Updated 3 years ago
JeanKaddour / NoTrainNoGain
View on GitHub
Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)
☆81Aug 30, 2023Updated 2 years ago
andrewjw / pyvobsub2srt
View on GitHub
A Python script to convert vobsub subtitles into srt format using tesseract for ocr
☆10Sep 28, 2014Updated 11 years ago
IST-DASLab / marlin
View on GitHub
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
☆1,109Sep 4, 2024Updated last year
Aaronhuang-778 / SliM-LLM
View on GitHub
[ICML 2025] SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
☆62Aug 9, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Vahe1994 / SpQR
View on GitHub
☆554Feb 8, 2026Updated 5 months ago
TobiasZawada / preview-dvisvgm
View on GitHub
Generate SVG images for `preview-latex`
☆13Dec 25, 2021Updated 4 years ago
DaertML / context_distillation
View on GitHub
Framework to achieve context distillation in LLMs
☆15Nov 24, 2023Updated 2 years ago
deJQK / FracBits
View on GitHub
Neural Network Quantization With Fractional Bit-widths
☆11Feb 19, 2021Updated 5 years ago
locuslab / wanda
View on GitHub
A simple and effective LLM pruning approach.
☆868Aug 9, 2024Updated last year
Guangxuan-Xiao / torch-int
View on GitHub
This repository contains integer operators on GPUs for PyTorch.
☆235Sep 29, 2023Updated 2 years ago
mlc-ai / llm-perf-bench
View on GitHub
☆121Apr 22, 2024Updated 2 years ago
ROCm / rocr_debug_agent
View on GitHub
The ROCdebug-agent is a library that can be loaded by ROCm Platform Runtime to provide some debugging functionality.
☆33Updated this week
multiplexerai / Namespace-RAG
View on GitHub
☆13Feb 18, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
nusnlp / greco
View on GitHub
The official code for the "System Combination via Quality Estimation for Grammatical Error Correction" paper, published in EMNLP 2023.
☆16Jan 24, 2026Updated 5 months ago
davisyoshida / jax-gptq
View on GitHub
JAX implementation of GPTQ quantization algorithm
☆10Jul 19, 2023Updated 3 years ago
intel / neural-speed
View on GitHub
An innovative library for efficient LLM inference via low-bit quantization
☆352Aug 30, 2024Updated last year
bigcode-project / bigcode-inference-benchmark
View on GitHub
☆19Aug 10, 2024Updated last year
IST-DASLab / OBC
View on GitHub
Code for the NeurIPS 2022 paper "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning".
☆132Jul 11, 2023Updated 3 years ago
upiterbarg / diff_history
View on GitHub
[ICML 2024] Official code release accompanying the paper "diff History for Neural Language Agents" (Piterbarg, Pinto, Fergus)
☆20Aug 20, 2024Updated last year
IST-DASLab / qmoe
View on GitHub
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
☆278Nov 3, 2023Updated 2 years ago