NVIDIA / Model-OptimizerView on GitHub
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.
2,031Updated this week

Alternatives and similar repositories for Model-Optimizer

Users that are interested in Model-Optimizer are comparing it to the libraries listed below

Sorting:

Are these results useful?