leimao / PyTorch-Eager-Mode-Quantization-TensorRT-AccelerationLinks
TensorRT Acceleration for PyTorch Native Eager Mode Quantization Models
☆17Updated last year
Alternatives and similar repositories for PyTorch-Eager-Mode-Quantization-TensorRT-Acceleration
Users that are interested in PyTorch-Eager-Mode-Quantization-TensorRT-Acceleration are comparing it to the libraries listed below
Sorting:
- Experimental CUDA kernel framework unifying typed dimensions, NVRTC JIT specialization, and ML‑guided tuning.☆46Updated last week
- A faster implementation of OpenCV-CUDA that uses OpenCV objects, and more!☆54Updated 2 months ago
- PyTorch Pruning Example☆51Updated 3 years ago
- Model compression for ONNX☆98Updated last year
- Timm model explorer☆42Updated last year
- Converting weights of Pytorch models to ONNX & TensorRT engines☆50Updated 2 years ago
- Easily benchmark PyTorch model FLOPs, latency, throughput, allocated gpu memory and energy consumption☆109Updated 2 years ago
- The Triton backend for TensorRT.☆85Updated this week
- A set of simple tools for splitting, merging, OP deletion, size compression, rewriting attributes and constants, OP generation, change op…☆303Updated last year
- Profile PyTorch models for FLOPs and parameters, helping to evaluate computational efficiency and memory usage.☆121Updated last month
- ☆178Updated 2 years ago
- The Triton backend for the PyTorch TorchScript models.☆173Updated this week
- AI Edge Quantizer: flexible post training quantization for LiteRT models.☆96Updated this week
- A Toolkit to Help Optimize Large Onnx Model☆163Updated 3 months ago
- Quick and Self-Contained TensorRT Custom Plugin Implementation and Integration☆79Updated 8 months ago
- a fast and customizable CUDA int4 tensor core gemm☆15Updated last year
- Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. Th…☆432Updated this week
- Implementation of YOLOv9 QAT optimized for deployment on TensorRT platforms.☆129Updated 9 months ago
- Step by step implementation of a fast softmax kernel in CUDA☆60Updated last year
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆85Updated this week
- Inference Vision Transformer (ViT) in plain C/C++ with ggml☆306Updated last year
- Common utilities for ONNX converters☆294Updated last month
- A tool convert TensorRT engine/plan to a fake onnx☆42Updated 3 years ago
- Notes on quantization in neural networks☆117Updated 2 years ago
- Count number of parameters / MACs / FLOPS for ONNX models.☆95Updated last year
- This repository describes how to add a custom TensorRT plugin in c++ and python☆29Updated 4 years ago
- DeepStream Libraries offer CVCUDA, NvImageCodec, and PyNvVideoCodec modules as Python APIs for seamless integration into custom framewor…☆77Updated 4 months ago
- Generalist YOLO: Towards Real-Time End-to-End Multi-Task Visual Language Models☆87Updated 9 months ago
- The Triton backend for the ONNX Runtime.☆172Updated this week
- A block oriented training approach for inference time optimization.☆34Updated last year