LLM Quantization toolkit
☆20Jun 9, 2026Updated this week
Alternatives and similar repositories for lm-quant-toolkit
Users that are interested in lm-quant-toolkit are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICLRW'26] EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation☆47Apr 21, 2026Updated last month
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆11Dec 13, 2023Updated 2 years ago
- ☆50May 9, 2026Updated last month
- Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 30+ benchmarks☆15Feb 17, 2025Updated last year
- Pytorch implementation of our paper accepted by ICML 2023 -- "Bi-directional Masks for Efficient N:M Sparse Training"☆13Jun 7, 2023Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Cross-Self KV Cache Pruning for Efficient Vision-Language Inference☆10Dec 15, 2024Updated last year
- [ACL'26 Findings] Steering LLM Thinking with Budget Guidance☆31Feb 19, 2026Updated 3 months ago
- BESA is a differentiable weight pruning technique for large language models.☆17Mar 4, 2024Updated 2 years ago
- ☆17May 2, 2024Updated 2 years ago
- ☆21Feb 5, 2024Updated 2 years ago
- [NAACL 2025] MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning☆20May 31, 2025Updated last year
- Pytorch implementation of our paper accepted by NeurIPS 2022 -- Learning Best Combination for Efficient N:M Sparsity☆22Jan 13, 2023Updated 3 years ago
- Activation-aware Singular Value Decomposition for Compressing Large Language Models☆92Oct 22, 2024Updated last year
- [ICML2025] KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference☆28Jan 27, 2026Updated 4 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆23Mar 15, 2024Updated 2 years ago
- [COLM 2025] DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation; 知乎:https://zhuanlan.zhihu.c…☆30Mar 5, 2025Updated last year
- This is the official Python version of CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Act…☆17Oct 25, 2024Updated last year
- Github Repo for OATS: Outlier-Aware Pruning through Sparse and Low Rank Decomposition☆20Apr 16, 2025Updated last year
- Reading notes on Speculative Decoding papers☆37Jun 2, 2026Updated last week
- Evolutionary-Algorithm and Large-Language-Model☆23Nov 5, 2024Updated last year
- [ICLR 2025] Official implementation of paper "Dynamic Low-Rank Sparse Adaptation for Large Language Models".☆25Mar 16, 2025Updated last year
- Code implementation of GPTAQ (https://arxiv.org/abs/2504.02692)☆92Jul 28, 2025Updated 10 months ago
- rdiv!(::AbstractMatrix, ::UpperTriangular) and ldiv!(::LowerTriangular, ::AbstractMatrix)☆12Nov 18, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"☆218Nov 25, 2025Updated 6 months ago
- Julia implementation of flash-attention operation for neural networks.☆11May 31, 2023Updated 3 years ago
- Code for KDD 2023 long paper: MetricPrompt: Prompting Model as a Relevance Metric for Few-Shot Text Classification☆19Aug 10, 2024Updated last year
- ☆129Jan 22, 2024Updated 2 years ago
- This is the source code of our ICML25 paper, titled "Accelerating Large Language Model Reasoning via Speculative Search".☆23Jun 1, 2025Updated last year
- Automatic differentiation of FEniCS and Firedrake models in Julia☆14Mar 21, 2021Updated 5 years ago
- Sparse symmetric indefinite solver implemented with a runtime system☆13May 11, 2020Updated 6 years ago
- SQL Optimizations using MLIR☆12Apr 5, 2020Updated 6 years ago
- 记录量化LLM中的总结。☆76Jan 8, 2026Updated 5 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- An MPI wrapper for the pytorch tensor library that is automatically differentiable☆10Mar 27, 2023Updated 3 years ago
- Distributed SDDMM Kernel☆12Jul 8, 2022Updated 3 years ago
- ☆26Feb 22, 2024Updated 2 years ago
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees" adapted for Llama models☆41Aug 4, 2023Updated 2 years ago
- Model REVOLVER, a human in the loop model mixing system.☆33Aug 2, 2023Updated 2 years ago
- Automatically interact with SVG charts.☆20Sep 23, 2025Updated 8 months ago
- LaTeX Examples Document Source☆11Apr 9, 2024Updated 2 years ago