SqueezeBits / owlite-examplesLinks
OwLite Examples repository offers illustrative example codes to help users seamlessly compress PyTorch deep learning models and transform them into TensorRT engines.
☆10Updated 9 months ago
Alternatives and similar repositories for owlite-examples
Users that are interested in owlite-examples are comparing it to the libraries listed below
Sorting:
- OwLite is a low-code AI model compression toolkit for AI models.☆46Updated last month
- ☆54Updated 7 months ago
- ☆56Updated 2 years ago
- Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Model…☆63Updated last year
- PyTorch CoreSIG☆55Updated 5 months ago
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆118Updated last year
- Study Group of Deep Learning Compiler☆160Updated 2 years ago
- ☆90Updated last year
- A performance library for machine learning applications.☆184Updated last year
- ☆59Updated last year
- ☆70Updated last month
- [ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs☆108Updated 2 months ago
- ☆100Updated last year
- ☆149Updated 2 years ago
- LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale☆119Updated last week
- This repository contains integer operators on GPUs for PyTorch.☆205Updated last year
- ☆47Updated 3 years ago
- Ditto is an open-source framework that enables direct conversion of HuggingFace PreTrainedModels into TensorRT-LLM engines.☆42Updated this week
- Tender: Accelerating Large Language Models via Tensor Decompostion and Runtime Requantization (ISCA'24)☆16Updated 11 months ago
- Official implementation of EMNLP'23 paper "Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?"☆22Updated last year
- Provides the examples to write and build Habana custom kernels using the HabanaTools☆21Updated 2 months ago
- Implement of Dynamic Model Pruning with Feedback with pytorch☆40Updated 3 years ago
- The official NetsPresso Python package.☆45Updated this week
- ☆76Updated 2 years ago
- ☆205Updated 3 years ago
- ☆25Updated 2 years ago
- nnq_cnd_study stands for Neural Network Quantization & Compact Networks Design Study☆13Updated 4 years ago
- Code for the NeurIPS 2022 paper "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning".☆122Updated last year
- ☆48Updated last year
- [ICML'21 Oral] I-BERT: Integer-only BERT Quantization☆250Updated 2 years ago