☆19Apr 3, 2025Updated 10 months ago
Alternatives and similar repositories for calibquant
Users that are interested in calibquant are comparing it to the libraries listed below
Sorting:
- UnitEval is a benchmarking and evaluation tools for AutoDev Coder.☆13Jan 2, 2024Updated 2 years ago
- Model Quantization Benchmark☆18Sep 30, 2025Updated 5 months ago
- This is a repository of Binary General Matrix Multiply (BGEMM) by customized CUDA kernel. Thank FP6-LLM for the wheels!☆18Aug 30, 2024Updated last year
- ☆17Oct 16, 2024Updated last year
- AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference☆20Jan 24, 2025Updated last year
- [ICLR 2025] Official implementation of paper "Dynamic Low-Rank Sparse Adaptation for Large Language Models".☆23Mar 16, 2025Updated 11 months ago
- The code for "AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference", Qingyue Yang, Jie Wang, Xing Li, Zhihai Wang, Ch…☆28Jul 15, 2025Updated 7 months ago
- Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models☆49Nov 5, 2024Updated last year
- [ICLR'25] ARB-LLM: Alternating Refined Binarizations for Large Language Models☆28Aug 5, 2025Updated 6 months ago
- Domain-Specific Architecture Generator 2☆22Oct 2, 2022Updated 3 years ago
- ☆26Mar 1, 2024Updated 2 years ago
- [ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"☆211Nov 25, 2025Updated 3 months ago
- FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation [Efficient ML Model]☆46Feb 17, 2026Updated last week
- [Preprint] Why is the State of Neural Network Pruning so Confusing? On the Fairness, Comparison Setup, and Trainability in Network Prunin…☆41Sep 9, 2025Updated 5 months ago
- [NeurIPS 2023] ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer☆30Dec 6, 2023Updated 2 years ago
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆39Mar 11, 2024Updated last year
- This is the official pytorch implementation for the paper: Towards Accurate Post-training Quantization for Diffusion Models.(CVPR24 Poste…☆38Jun 4, 2024Updated last year
- Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming☆35Jun 29, 2023Updated 2 years ago
- [CVPR 2025 Highlight] FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation☆26Jun 16, 2025Updated 8 months ago
- [NeurIPS'24]Efficient and accurate memory saving method towards W4A4 large multi-modal models.☆98Jan 3, 2025Updated last year
- PyTorch Quantization Framework For OCP MX Datatypes.☆16May 30, 2025Updated 9 months ago
- ☆11Aug 20, 2025Updated 6 months ago
- SparseGPT + GPTQ Compression of LLMs like LLaMa, OPT, Pythia☆42Mar 13, 2023Updated 2 years ago
- ☆14Jan 24, 2025Updated last year
- [CVPR 2022] AlignQ: Alignment Quantization with ADMM-based Correlation Preservation☆11Jan 6, 2023Updated 3 years ago
- A proc macro regex library to match an arbitrary string or byte array to a regular expression.☆11Jun 5, 2022Updated 3 years ago
- 北航校园网网关自动登录☆10Nov 8, 2021Updated 4 years ago
- An efficient distillation method for flow matching models☆22Feb 1, 2026Updated last month
- Convert shared libraries into relocatable objects☆10Dec 23, 2023Updated 2 years ago
- MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer (EMNLP 2025)☆11Apr 18, 2025Updated 10 months ago
- ☆13May 21, 2023Updated 2 years ago
- Chameleon: A MatMul-Free TCN Accelerator for End-to-End Few-Shot and Continual Learning from Sequential Data☆25Jun 6, 2025Updated 8 months ago
- KAF : Kolmogorov-Arnold Fourier Networks☆20Feb 19, 2025Updated last year
- Codes for our paper "Exploring Bit-Slice Sparsity in Deep Neural Networks for Efficient ReRAM-Based Deployment" [NeurIPS'19 EMC2 workshop]…☆10Oct 12, 2020Updated 5 years ago
- Express DLA implementation for FPGA, revised based on NVDLA.☆11Oct 17, 2019Updated 6 years ago
- A LR(1) parser generator targeting C++17.☆13Jul 8, 2020Updated 5 years ago
- MoE-Visualizer is a tool designed to visualize the selection of experts in Mixture-of-Experts (MoE) models.☆16Apr 8, 2025Updated 10 months ago
- An implementation of memcpy for amd64 with clang/gcc☆15Feb 7, 2022Updated 4 years ago
- Training Quantized Neural Networks with a Full-precision Auxiliary Module☆13Jun 19, 2020Updated 5 years ago