aiha-lab / TSLDLinks
[NeurIPS 2023] Token-Scaled Logit Distillation for Ternary Weight Generative Language Models
β18Updated last year
Alternatives and similar repositories for TSLD
Users that are interested in TSLD are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2024 Oralπ₯] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.β171Updated last year
- Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"β72Updated 3 months ago
- [ICML 2025] SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Modelsβ44Updated last year
- An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantizationβ153Updated 4 months ago
- LLM Inference with Microscaling Formatβ31Updated 10 months ago
- AFPQ code implementationβ23Updated last year
- [ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.β123Updated last year
- This repo contains the source code for: Model Tells You What to Discard: Adaptive KV Cache Compression for LLMsβ40Updated last year
- Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Modelβ¦β66Updated last year
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantizationβ38Updated last year
- Code Repository of Evaluating Quantized Large Language Modelsβ132Updated last year
- [COLM 2025] DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation; η₯δΉοΌhttps://zhuanlan.zhihu.cβ¦β28Updated 7 months ago
- Activation-aware Singular Value Decomposition for Compressing Large Language Modelsβ80Updated 11 months ago
- [ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMsβ115Updated 3 months ago
- β46Updated 11 months ago
- This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"β106Updated 4 months ago
- β20Updated last year
- β22Updated 11 months ago
- [AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Modelsβ61Updated last year
- [ICLR2025]: OSTQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fittβ¦β78Updated 6 months ago
- [COLM 2024] SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Modelsβ24Updated last year
- Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)β66Updated 6 months ago
- Code implementation of GPTAQ (https://arxiv.org/abs/2504.02692)β65Updated 2 months ago
- β55Updated last year
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. π The official implementation of https://arxβ¦β23Updated 7 months ago
- β28Updated 10 months ago
- [COLM 2025] Official PyTorch implementation of "Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models"β54Updated 3 months ago
- Awesome LLM pruning papers all-in-one repository with integrating all useful resources and insights.β123Updated 2 months ago
- Official Implementation of SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocksβ37Updated 8 months ago
- [ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projectionβ142Updated 7 months ago