IST-DASLab/gptq-gguf-toolkit

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/IST-DASLab/gptq-gguf-toolkit)

IST-DASLab / gptq-gguf-toolkit

Efficient non-uniform quantization with GPTQ for GGUF

☆64

Alternatives and similar repositories for gptq-gguf-toolkit

Users that are interested in gptq-gguf-toolkit are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

IST-DASLab / GSQ
View on GitHub
Gumbel-Softmax post-training quantization for LLMs (1–3 bit scalar, INT/GGUF-compatible).
☆15Jul 11, 2026Updated last week
IST-DASLab / EvoPress
View on GitHub
☆43Jun 14, 2026Updated last month
IST-DASLab / sparseprop
View on GitHub
☆16Sep 27, 2023Updated 2 years ago
fishiatee / Tumera
View on GitHub
Yet another frontend for LLM, written using .NET and WinUI 3
☆11Sep 14, 2025Updated 10 months ago
IST-DASLab / QuEST
View on GitHub
Work in progress.
☆80Nov 25, 2025Updated 7 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
IST-DASLab / Quartet
View on GitHub
☆127Mar 18, 2026Updated 4 months ago
IST-DASLab / FP-Quant
View on GitHub
☆114Feb 26, 2026Updated 4 months ago
martin-marek / batch-size
View on GitHub
📄Small Batch Size Training for Language Models
☆82Mar 18, 2026Updated 4 months ago
IST-DASLab / torch_cgx
View on GitHub
Pytorch distributed backend extension with compression support
☆17Mar 24, 2025Updated last year
mzbac / open-chat
View on GitHub
A simple frontend page to interact with an OpenAI like API
☆17Jan 31, 2025Updated last year
thad0ctor / KrunchWrapper
View on GitHub
☆18Jul 1, 2025Updated last year
ModelCloud / Device-SMI
View on GitHub
Self-contained Python lib with zero-dependencies that give you a unified device properties for gpu, cpu, and npu. No more calling separat…
☆16Jul 6, 2026Updated 2 weeks ago
Cornell-RelaxML / qtip
View on GitHub
☆180Jun 22, 2025Updated last year
IST-DASLab / MatGPTQ
View on GitHub
Code for MatGPTQ: Accurate and Efficient Post-Training Matryoshka Quantization
☆22Feb 18, 2026Updated 5 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
reka-ai / rekaquant
View on GitHub
☆63Jul 10, 2025Updated last year
ModelCloud / Evalution
View on GitHub
Evalution: evolve your LLMs with better evals.
☆16Updated this week
IST-DASLab / MoE-Quant
View on GitHub
Code for data-aware compression of DeepSeek models
☆75Dec 11, 2025Updated 7 months ago
Cornell-RelaxML / yaqa-quantization
View on GitHub
☆84Jun 20, 2025Updated last year
Xingyu-Zheng / FOEM
View on GitHub
(AAAI 2026) First-Order Error Matters: Accurate Compensation for Quantized Large Language Models
☆16Apr 16, 2026Updated 3 months ago
IST-DASLab / llmq
View on GitHub
Quantized LLM training in pure CUDA/C++.
☆250Updated this week
blindTissue / logit_lens_llama_advanced
View on GitHub
☆18Jun 22, 2026Updated 3 weeks ago
thunlp / NOSA
View on GitHub
The official implementation of NOSA
☆19Jun 11, 2026Updated last month
tohurtv / llama.cpp-qt
View on GitHub
Llama.cpp-qt is a Python-based GUI wrapper for the LLama.cpp server, providing a user-friendly interface for configuring and running the …
☆16Oct 4, 2023Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
IST-DASLab / qutlass
View on GitHub
QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning
☆191Updated this week
Farseer-Scaling-Law / Farseer
View on GitHub
☆21Jun 12, 2025Updated last year
edwardmilsom / function-space-learning-rates-paper
View on GitHub
Code for the paper "Function-Space Learning Rates"
☆23Jun 3, 2025Updated last year
parsa-epfl / quantization-sparsity-interplay
View on GitHub
This repo contains the code for studying the interplay between quantization and sparsity methods
☆26Feb 26, 2025Updated last year
abgulati / hf-waitress
View on GitHub
Serving LLMs in the HF-Transformers format via a PyFlask API
☆72Sep 10, 2024Updated last year
facebookresearch / ParetoQ
View on GitHub
This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"
☆131Oct 15, 2025Updated 9 months ago
UMass-Embodied-AGI / BudgetGuidance
View on GitHub
[ACL'26 Findings] Steering LLM Thinking with Budget Guidance
☆32Feb 19, 2026Updated 5 months ago
violetxi / ExpRL
View on GitHub
☆19Jun 16, 2026Updated last month
allenai / olmix
View on GitHub
☆41May 26, 2026Updated last month
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
kuterd / opal_ptx
View on GitHub
Experimental GPU language with meta-programming
☆31Sep 6, 2024Updated last year
IST-DASLab / qmoe
View on GitHub
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
☆278Nov 3, 2023Updated 2 years ago
sanyalsunny111 / Looped-GPT
View on GitHub
Minimal and highly hackable implementation of Looped Transformers with GPT
☆25Mar 8, 2026Updated 4 months ago
Cornell-RelaxML / quip-sharp
View on GitHub
☆600Oct 29, 2024Updated last year
WebChoreArena / WebChoreArena
View on GitHub
COLM2026
☆36Jul 9, 2026Updated last week
xNul / codestral-mamba-for-vscode
View on GitHub
Use Codestral Mamba with Visual Studio Code and the Continue extension. A local LLM alternative to GitHub Copilot.
☆31Jul 18, 2024Updated 2 years ago
MaggotHATE / Llama_chat
View on GitHub
A chat UI for Llama.cpp
☆16Jun 4, 2026Updated last month