zjq0455 / PTQ1.61Links

☆15

Alternatives and similar repositories for PTQ1.61

Users that are interested in PTQ1.61 are comparing it to the libraries listed below

Sorting:

csguoh / OBR
[ICLR2026] The first W4A4KV4 quantized + 50% sparse LLMs!
☆21Updated this week
dongwonjo / FastKV
Official Implementation of FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration
☆28Updated 2 months ago
Intelligent-Computing-Lab-Panda / TesseraQ
☆25Updated last year
HuangOwen / RoLoRA
[EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization
☆38Updated last year
htqin / IR-QLoRA
[ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…
☆67Updated last year
GATECH-EIC / Linearized-LLM
[ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
☆35Updated last year
thu-ml / TetraJet-MXFP4Training
Pytorch implementation of "Oscillation-Reduced MXFP4 Training for Vision Transformers" on DeiT Model Pre-training
☆34Updated 7 months ago
aiha-lab / MX-QLLM
LLM Inference with Microscaling Format
☆34Updated last year
ModelTC / QLLM
[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…
☆39Updated last year
BaohaoLiao / ApiQ
[EMNLP 2024] Quantize LLM to extremely low-bit, and finetune the quantized LLMs
☆15Updated last year
thu-nics / MBQ
The code repository of "MBQ: Modality-Balanced Quantization for Large Vision-Language Models"
☆75Updated 10 months ago
ylsung / rsq
Code for "RSQ: Learning from Important Tokens Leads to Better Quantized LLMs"
☆20Updated 7 months ago
thu-nics / R2R
[NeurIPS'25] The official code implementation for paper "R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Tok…
☆75Updated this week
ArminAzizi98 / LaMDA
☆15Updated last year
UNITES-Lab / C2R-MoE
[NAACL'25 🏆 SAC Award] Official code for "Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert…
☆14Updated 11 months ago
pprp / Pruner-Zero
[ICML24] Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs
☆98Updated last year
A-suozhang / MixDQ
[ECCV24] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization
☆14Updated last year
LiqunMa / FBI-LLM
FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation
☆50Updated 5 months ago
ruikangliu / Quantized-Reasoning-Models
[COLM 2025] Official PyTorch implementation of "Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models"
☆67Updated 6 months ago
AozhongZhang / MagR
☆13Updated 7 months ago
IST-DASLab / HALO
HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation of https://arx…
☆29Updated 11 months ago
ziplab / QLLM
[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…
☆31Updated last year
SAI-Lab-NYU / QSVD
This repository provides the official implementation of QSVD, a method for efficient low-rank approximation that unifies Query-Key-Value …
☆24Updated 2 months ago
Aaronhuang-778 / Mixture-Compressor-MoE
[ICLR 2025] Mixture Compressor for Mixture-of-Experts LLMs Gains More
☆66Updated 11 months ago
enyac-group / Quamba
The official repository of Quamba1 [ICLR 2025] & Quamba2 [ICML 2025]
☆66Updated 7 months ago
ROIM1998 / APT
[ICML'24 Oral] APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference
☆46Updated last year
metacarbon / shareAtt
Beyond KV Caching: Shared Attention for Efficient LLMs
☆20Updated last year
JarvisPei / CMoE
Implementation for the paper: CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference
☆33Updated 10 months ago
BaiTheBest / SparseLLM
Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)
☆67Updated 10 months ago
pixeli99 / MixLN
[ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…
☆29Updated 6 months ago