LLM Quantization toolkit
☆20Apr 13, 2026Updated this week
Alternatives and similar repositories for lm-quant-toolkit
Users that are interested in lm-quant-toolkit are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Pytorch implementation of our paper accepted by ICML 2023 -- "Bi-directional Masks for Efficient N:M Sparse Training"☆13Jun 7, 2023Updated 2 years ago
- Cross-Self KV Cache Pruning for Efficient Vision-Language Inference☆10Dec 15, 2024Updated last year
- BESA is a differentiable weight pruning technique for large language models.☆17Mar 4, 2024Updated 2 years ago
- ☆17May 2, 2024Updated last year
- Pytorch code of [CVPR 2023] "NAR-Former: Neural Architecture Representation Learning towards Holistic Attributes Prediction".☆11Mar 14, 2023Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆21Feb 5, 2024Updated 2 years ago
- channel pruning for accelerating very deep neural networks☆13Mar 8, 2021Updated 5 years ago
- Pytorch implementation of our paper accepted by NeurIPS 2022 -- Learning Best Combination for Efficient N:M Sparsity☆22Jan 13, 2023Updated 3 years ago
- Github Repo for OATS: Outlier-Aware Pruning through Sparse and Low Rank Decomposition☆19Apr 16, 2025Updated 11 months ago
- [ICML2025] KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference☆26Jan 27, 2026Updated 2 months ago
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆23Mar 15, 2024Updated 2 years ago
- This is the official Python version of CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Act…☆17Oct 25, 2024Updated last year
- A fork of textgen that kept some things like Exllama and old GPTQ.☆22Aug 20, 2024Updated last year
- Reading notes on Speculative Decoding papers☆29Feb 24, 2026Updated last month
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Code implementation of GPTAQ (https://arxiv.org/abs/2504.02692)☆88Jul 28, 2025Updated 8 months ago
- rdiv!(::AbstractMatrix, ::UpperTriangular) and ldiv!(::LowerTriangular, ::AbstractMatrix)☆12Nov 18, 2024Updated last year
- [ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"☆212Nov 25, 2025Updated 4 months ago
- Julia implementation of flash-attention operation for neural networks.☆11May 31, 2023Updated 2 years ago
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆39Mar 11, 2024Updated 2 years ago
- This is the source code of our ICML25 paper, titled "Accelerating Large Language Model Reasoning via Speculative Search".☆23Jun 1, 2025Updated 10 months ago
- ☆129Jan 22, 2024Updated 2 years ago
- Automatic differentiation of FEniCS and Firedrake models in Julia☆14Mar 21, 2021Updated 5 years ago
- Sparse symmetric indefinite solver implemented with a runtime system☆13May 11, 2020Updated 5 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- SQL Optimizations using MLIR☆12Apr 5, 2020Updated 6 years ago
- Official PyTorch implementation of the paper "Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Princ…☆42Jul 18, 2025Updated 8 months ago
- An MPI wrapper for the pytorch tensor library that is automatically differentiable☆10Mar 27, 2023Updated 3 years ago
- Distributed SDDMM Kernel☆12Jul 8, 2022Updated 3 years ago
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees" adapted for Llama models☆41Aug 4, 2023Updated 2 years ago
- Automatically interact with SVG charts.☆19Sep 23, 2025Updated 6 months ago
- LaTeX Examples Document Source☆11Apr 9, 2024Updated 2 years ago
- Allows an AbstractArray, to look like an AbstractArray with one more dimension and the tiles are represented along this dimension.☆14May 3, 2025Updated 11 months ago
- Code Repository of Evaluating Quantized Large Language Models☆134Sep 8, 2024Updated last year
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Implementation of Effective Sparsification of Neural Networks with Global Sparsity Constraint☆31Mar 24, 2022Updated 4 years ago
- Code for "Adaptive Self-improvement LLM Agentic System for ML Library Development" (ICML 2025)☆15Jan 6, 2026Updated 3 months ago
- An Attention Superoptimizer☆22Jan 20, 2025Updated last year
- A plugin for presenting an IPFS gateway over i2p☆17Apr 25, 2020Updated 5 years ago
- HiCOPS: Computational framework for peptide identification from MS data through accelerated database search☆10Mar 24, 2023Updated 3 years ago
- Build TVM docker image for production compilation deployments☆12Sep 7, 2021Updated 4 years ago
- ja is a small CLI / TUI app that allows you to work with AI tools☆20Mar 30, 2026Updated 2 weeks ago