Efficient-ML/Qwen3-Quantization-Toolkit

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Efficient-ML/Qwen3-Quantization-Toolkit)

Efficient-ML / Qwen3-Quantization-Toolkit

☆79

Alternatives and similar repositories for Qwen3-Quantization-Toolkit

Users that are interested in Qwen3-Quantization-Toolkit are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Xingyu-Zheng / BinaryDM
View on GitHub
(ICLR 2025) BinaryDM: Accurate Weight Binarization for Efficient Diffusion Models
☆25Oct 4, 2024Updated last year
zhuhanqing / Lightening-Transformer-AE
View on GitHub
Artifact evaluation for HPCA'24 paper Lightening-Transformer: A Dynamically-operated Optically-interconnected Photonic Transformer Accele…
☆11Mar 3, 2024Updated 2 years ago
snu-mllab / GuidedQuant
View on GitHub
Official PyTorch implementation of "GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance" (ICML 2025)
☆53Apr 13, 2026Updated 2 months ago
YanjingLi0202 / Bi-ViT
View on GitHub
The official implementation of the AAAI 2024 paper Bi-ViT.
☆13Dec 18, 2023Updated 2 years ago
HuangOwen / RoLoRA
View on GitHub
[EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization
☆40Sep 24, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
GATECH-EIC / SuperTickets
View on GitHub
[ECCV 2022] SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning
☆20Jul 7, 2022Updated 3 years ago
UNITES-Lab / C2R-MoE
View on GitHub
[NAACL'25 🏆 SAC Award] Official code for "Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert…
☆16Feb 4, 2025Updated last year
Aaronhuang-778 / SliM-LLM
View on GitHub
[ICML 2025] SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
☆62Aug 9, 2024Updated last year
jiaqileng / quantum-hamiltonian-descent
View on GitHub
Quantum Hamiltonian Descent: numerical simulation, real-machine deployment, and benchmarking
☆14Jan 16, 2024Updated 2 years ago
Lucanyc / VISTA-Gym
View on GitHub
☆26Mar 17, 2026Updated 3 months ago
Xingyu-Zheng / BiDM
View on GitHub
(NeurIPS 2024) BiDM: Pushing the Limit of Quantization for Diffusion Models
☆22Nov 20, 2024Updated last year
Yarayx / livelongbench
View on GitHub
The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…
☆12Jun 28, 2025Updated last year
TIGER-AI-Lab / VISTA
View on GitHub
The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]
☆20Feb 27, 2025Updated last year
zyxxmu / Bi-Mask
View on GitHub
Pytorch implementation of our paper accepted by ICML 2023 -- "Bi-directional Masks for Efficient N:M Sparse Training"
☆13Jun 7, 2023Updated 3 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
ludc506 / InternVL-X
View on GitHub
☆16Mar 26, 2025Updated last year
chengtao-lv / PTQ4SAM
View on GitHub
[CVPR 2024] PTQ4SAM: Post-Training Quantization for Segment Anything
☆86Jun 26, 2024Updated 2 years ago
DravenALG / ReSTE
View on GitHub
(ICCV 2023) Official implementation of Rectified Straight Through Estimator (ReSTE).
☆34Sep 20, 2024Updated last year
ChenMnZ / PrefixQuant
View on GitHub
An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantization
☆175Nov 26, 2025Updated 7 months ago
ustcwhy / BitVLA
View on GitHub
Official implementation for BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation
☆159Mar 2, 2026Updated 4 months ago
elicit / fave-dataset
View on GitHub
Paper dataset for "Factored Verification: Detecting and Reducing Hallucination in Summaries of Academic Papers"
☆13Oct 20, 2024Updated last year
Intelligent-Computing-Lab-Panda / GPTAQ
View on GitHub
Code implementation of GPTAQ (https://arxiv.org/abs/2504.02692)
☆92Jul 28, 2025Updated 11 months ago
ysy-phoenix / evalhub
View on GitHub
All-in-one benchmarking platform for evaluating LLM.
☆15Nov 12, 2025Updated 7 months ago
pprp / STBLLM
View on GitHub
[ICLR25] STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
☆20Jun 3, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
NY1024 / SafeBench
View on GitHub
☆22Oct 25, 2024Updated last year
JeremieMelo / L2ight
View on GitHub
☆26Nov 10, 2021Updated 4 years ago
pa-ba / reg-machine
View on GitHub
Coq & Haskell code for Calculating Correct Compilers II
☆12Feb 22, 2022Updated 4 years ago
cspzyy / RealHiTBench
View on GitHub
[ACL 2025] RealHiTBench: A Comprehensive Realistic Hierarchical Table Benchmark for Evaluating LLM-Based Table Analysis
☆26Aug 8, 2025Updated 10 months ago
texcoffier / zmw
View on GitHub
Zero Memory Widget
☆10Dec 30, 2020Updated 5 years ago
thu-ml / ReMoE
View on GitHub
[ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.
☆116Dec 20, 2024Updated last year
aojunzz / DominoSearch
View on GitHub
☆19Dec 10, 2021Updated 4 years ago
cwida / ivm-extension
View on GitHub
Incremental View Maintenance support for DuckDB
☆18Oct 24, 2023Updated 2 years ago
ruikangliu / Quantized-Reasoning-Models
View on GitHub
[COLM 2025] Official PyTorch implementation of "Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models"
☆76Jul 8, 2025Updated 11 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
winter1203 / vllm_GOT2_OCR
View on GitHub
Accelerating GOT-OCRv2 with VLLM
☆10Nov 15, 2024Updated last year
abnerjacobsen / fastapi-mvc-loguru-demo
View on GitHub
Demo app with Loguru logging, async middleware to generate X-request-Id. Works with Gunicorn or Uvicorn, and is safe to use with async/th…
☆10Feb 2, 2022Updated 4 years ago
janhq / llama.cpp
View on GitHub
LLM inference in C/C++
☆34Updated this week
zsxkib / cog-nvidia-canary-qwen-2.5b
View on GitHub
🙊Cogified speech-to-text model nvidia/canary-qwen-2.5b (best ASR model according to hf-audio/open_asr_leaderboard as of 18/Jul/2025)🎙️
☆23Jul 28, 2025Updated 11 months ago
jha-lab / codebench
View on GitHub
[TECS'23] A project on the co-design of Accelerators and CNNs.
☆22Dec 10, 2022Updated 3 years ago
safety-research / inverse-scaling-ttc
View on GitHub
Inverse Scaling in Test-Time Compute
☆25Dec 3, 2025Updated 7 months ago
kalmarek / RamanujanGraphs.jl
View on GitHub
As defined in Lubotzky, Philips and Sarnak
☆10Oct 25, 2022Updated 3 years ago