Efficient-ML/Qwen3-Quantization

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Efficient-ML/Qwen3-Quantization)

Efficient-ML / Qwen3-Quantization

☆75

Alternatives and similar repositories for Qwen3-Quantization

Users that are interested in Qwen3-Quantization are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Xingyu-Zheng / BinaryDM
View on GitHub
(ICLR 2025) BinaryDM: Accurate Weight Binarization for Efficient Diffusion Models
☆25Oct 4, 2024Updated last year
YanjingLi0202 / Bi-ViT
View on GitHub
The official implementation of the AAAI 2024 paper Bi-ViT.
☆13Dec 18, 2023Updated 2 years ago
UNITES-Lab / HEXA-MoE
View on GitHub
Official code for the paper "HEXA-MoE: Efficient and Heterogeneous-Aware MoE Acceleration with Zero Computation Redundancy"
☆15Mar 6, 2025Updated last year
HuangOwen / RoLoRA
View on GitHub
[EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization
☆39Sep 24, 2024Updated last year
GATECH-EIC / SuperTickets
View on GitHub
[ECCV 2022] SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning
☆20Jul 7, 2022Updated 3 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
UNITES-Lab / C2R-MoE
View on GitHub
[NAACL'25 🏆 SAC Award] Official code for "Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert…
☆16Feb 4, 2025Updated last year
Lucanyc / VISTA-Gym
View on GitHub
☆23Mar 17, 2026Updated last month
Xingyu-Zheng / BiDM
View on GitHub
(NeurIPS 2024) BiDM: Pushing the Limit of Quantization for Diffusion Models
☆22Nov 20, 2024Updated last year
wlzhao22 / tsdg
View on GitHub
TSDG: An efficient index graph for graph-based nearest neighbor search
☆10Jul 14, 2022Updated 3 years ago
Cornell-RelaxML / yaqa-quantization
View on GitHub
☆76Jun 20, 2025Updated 10 months ago
TIGER-AI-Lab / VISTA
View on GitHub
The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]
☆21Feb 27, 2025Updated last year
zyxxmu / Bi-Mask
View on GitHub
Pytorch implementation of our paper accepted by ICML 2023 -- "Bi-directional Masks for Efficient N:M Sparse Training"
☆13Jun 7, 2023Updated 2 years ago
ludc506 / InternVL-X
View on GitHub
☆16Mar 26, 2025Updated last year
chengtao-lv / PTQ4SAM
View on GitHub
[CVPR 2024] PTQ4SAM: Post-Training Quantization for Segment Anything
☆85Jun 26, 2024Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
DravenALG / ReSTE
View on GitHub
(ICCV 2023) Official implementation of Rectified Straight Through Estimator (ReSTE).
☆34Sep 20, 2024Updated last year
ChenMnZ / PrefixQuant
View on GitHub
An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantization
☆171Nov 26, 2025Updated 5 months ago
ustcwhy / BitVLA
View on GitHub
Official implementation for BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation
☆147Mar 2, 2026Updated 2 months ago
AI-secure / adversarial-glue
View on GitHub
[NeurIPS 2021] "Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models" by Boxin Wang*, Chejian Xu*, Shuoh…
☆13Apr 3, 2023Updated 3 years ago
ShiheWang / FIMA-Q
View on GitHub
[CVPR 2025 Highlight] FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation
☆29Jun 16, 2025Updated 10 months ago
Purewhite2019 / formal_problem_solving_main
View on GitHub
Official implementation of "Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving"
☆29May 8, 2025Updated 11 months ago
Intelligent-Computing-Lab-Panda / GPTAQ
View on GitHub
Code implementation of GPTAQ (https://arxiv.org/abs/2504.02692)
☆89Jul 28, 2025Updated 9 months ago
ysy-phoenix / evalhub
View on GitHub
All-in-one benchmarking platform for evaluating LLM.
☆15Nov 12, 2025Updated 5 months ago
NY1024 / SafeBench
View on GitHub
☆22Oct 25, 2024Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
pa-ba / reg-machine
View on GitHub
Coq & Haskell code for Calculating Correct Compilers II
☆12Feb 22, 2022Updated 4 years ago
SongLee24 / contacts-app
View on GitHub
去年写的一个基于 SQLite 的通讯录APP，现在把它迁移到 Android Studio 中，并对界面进行了美化。
☆14Jan 25, 2015Updated 11 years ago
ankushmandal / topkapi
View on GitHub
☆15Nov 6, 2018Updated 7 years ago
texcoffier / zmw
View on GitHub
Zero Memory Widget
☆10Dec 30, 2020Updated 5 years ago
thu-ml / ReMoE
View on GitHub
[ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.
☆113Dec 20, 2024Updated last year
cwida / ivm-extension
View on GitHub
Incremental View Maintenance support for DuckDB
☆16Oct 24, 2023Updated 2 years ago
ruikangliu / Quantized-Reasoning-Models
View on GitHub
[COLM 2025] Official PyTorch implementation of "Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models"
☆73Jul 8, 2025Updated 9 months ago
mit-han-lab / neurips-micronet
View on GitHub
[JMLR'20] NeurIPS 2019 MicroNet Challenge Efficient Language Modeling, Champion
☆42Feb 26, 2021Updated 5 years ago
winter1203 / vllm_GOT2_OCR
View on GitHub
Accelerating GOT-OCRv2 with VLLM
☆10Nov 15, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
abnerjacobsen / fastapi-mvc-loguru-demo
View on GitHub
Demo app with Loguru logging, async middleware to generate X-request-Id. Works with Gunicorn or Uvicorn, and is safe to use with async/th…
☆10Feb 2, 2022Updated 4 years ago
janhq / llama.cpp
View on GitHub
LLM inference in C/C++
☆29Apr 28, 2026Updated last week
jha-lab / codebench
View on GitHub
[TECS'23] A project on the co-design of Accelerators and CNNs.
☆21Dec 10, 2022Updated 3 years ago
safety-research / inverse-scaling-ttc
View on GitHub
Inverse Scaling in Test-Time Compute
☆25Dec 3, 2025Updated 5 months ago
bytedance / AffineQuant
View on GitHub
Official implementation of the ICLR 2024 paper AffineQuant
☆30Mar 30, 2024Updated 2 years ago
Lornatang / MobileNetV1-PyTorch
View on GitHub
PyTorch implements `MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications` paper.
☆16May 25, 2023Updated 2 years ago
IlyasMoutawwakil / Faster-TrOCR
View on GitHub
TrOCR but 2 to 3 times faster
☆11Oct 22, 2022Updated 3 years ago