LLM Quantization toolkit
☆20May 2, 2026Updated this week
Alternatives and similar repositories for lm-quant-toolkit
Users that are interested in lm-quant-toolkit are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆11Dec 13, 2023Updated 2 years ago
- Multichannel Looper/Feedback System for Riffusion☆14May 6, 2023Updated 3 years ago
- Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 30+ benchmarks☆15Feb 17, 2025Updated last year
- Pytorch implementation of our paper accepted by ICML 2023 -- "Bi-directional Masks for Efficient N:M Sparse Training"☆13Jun 7, 2023Updated 2 years ago
- ☆12Apr 4, 2024Updated 2 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Steering LLM Thinking with Budget Guidance☆30Feb 19, 2026Updated 2 months ago
- BESA is a differentiable weight pruning technique for large language models.☆17Mar 4, 2024Updated 2 years ago
- ☆17May 2, 2024Updated 2 years ago
- ☆21Feb 5, 2024Updated 2 years ago
- [NAACL 2025] MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning☆19May 31, 2025Updated 11 months ago
- Pytorch implementation of our paper accepted by NeurIPS 2022 -- Learning Best Combination for Efficient N:M Sparsity☆22Jan 13, 2023Updated 3 years ago
- Stalker Checker MAC Generator Portal IPTV FREE☆17Nov 18, 2024Updated last year
- Activation-aware Singular Value Decomposition for Compressing Large Language Models☆92Oct 22, 2024Updated last year
- A tool for bridging MAC address portals with Plex or M3U-based software.☆27Dec 9, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [ICML2025] KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference☆28Jan 27, 2026Updated 3 months ago
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆23Mar 15, 2024Updated 2 years ago
- This is the official Python version of CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Act…☆17Oct 25, 2024Updated last year
- [COLM 2025] DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation; 知乎:https://zhuanlan.zhihu.c…☆30Mar 5, 2025Updated last year
- Mobile App code for Android & iOS on React Native☆26Dec 7, 2023Updated 2 years ago
- Reading notes on Speculative Decoding papers☆31Apr 16, 2026Updated 2 weeks ago
- Evolutionary-Algorithm and Large-Language-Model☆23Nov 5, 2024Updated last year
- [ICLR 2025] Official implementation of paper "Dynamic Low-Rank Sparse Adaptation for Large Language Models".☆24Mar 16, 2025Updated last year
- Code implementation of GPTAQ (https://arxiv.org/abs/2504.02692)☆89Jul 28, 2025Updated 9 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- rdiv!(::AbstractMatrix, ::UpperTriangular) and ldiv!(::LowerTriangular, ::AbstractMatrix)☆12Nov 18, 2024Updated last year
- PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation [NeurIPS 2025]☆18Oct 11, 2025Updated 6 months ago
- [ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"☆214Nov 25, 2025Updated 5 months ago
- Julia implementation of flash-attention operation for neural networks.☆11May 31, 2023Updated 2 years ago
- ☆18Sep 21, 2022Updated 3 years ago
- The code for "AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference", Qingyue Yang, Jie Wang, Xing Li, Zhihai Wang, Ch…☆28Jul 15, 2025Updated 9 months ago
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆40Mar 11, 2024Updated 2 years ago
- ☆129Jan 22, 2024Updated 2 years ago
- SQL Optimizations using MLIR☆12Apr 5, 2020Updated 6 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- 记录量化LLM中的总结。☆72Jan 8, 2026Updated 3 months ago
- An MPI wrapper for the pytorch tensor library that is automatically differentiable☆10Mar 27, 2023Updated 3 years ago
- Distributed SDDMM Kernel☆12Jul 8, 2022Updated 3 years ago
- ☆26Feb 22, 2024Updated 2 years ago
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees" adapted for Llama models☆41Aug 4, 2023Updated 2 years ago
- Model REVOLVER, a human in the loop model mixing system.☆33Aug 2, 2023Updated 2 years ago
- Automatically interact with SVG charts.☆19Sep 23, 2025Updated 7 months ago