ultralytics / thop
Profile PyTorch models for FLOPs and parameters, helping to evaluate computational efficiency and memory usage.
☆20Updated this week
Related projects ⓘ
Alternatives and complementary repositories for thop
- NVIDIA DLA-SW, the recipes and tools for running deep learning workloads on NVIDIA DLA cores for inference applications.☆179Updated 5 months ago
- Easily benchmark PyTorch model FLOPs, latency, throughput, allocated gpu memory and energy consumption☆92Updated last year
- QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX☆127Updated 3 weeks ago
- ☆123Updated last year
- ☆82Updated 2 months ago
- Count number of parameters / MACs / FLOPS for ONNX models.☆89Updated 3 weeks ago
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆112Updated 8 months ago
- Inference of quantization aware trained networks using TensorRT☆79Updated last year
- Implementation of YOLOv9 QAT optimized for deployment on TensorRT platforms.☆84Updated 2 weeks ago
- ☆195Updated 3 years ago
- The Triton backend that allows running GPU-accelerated data pre-processing pipelines implemented in DALI's python API.☆125Updated 2 weeks ago
- Sample app code for deploying TAO Toolkit trained models to Triton☆84Updated 2 months ago
- Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. Th…☆330Updated this week
- A tutorial for getting started with the Deep Learning Accelerator (DLA) on NVIDIA Jetson☆288Updated 2 years ago
- A Toolkit to Help Optimize Large Onnx Model☆149Updated 6 months ago
- ☆134Updated last year
- ☆52Updated 2 weeks ago
- A repository dedicated to evaluating the performance of quantizied LLaMA3 using various quantization methods..☆166Updated 3 months ago
- A code generator from ONNX to PyTorch code☆133Updated 2 years ago
- ☆32Updated last year
- A Toolkit to Help Optimize Onnx Model☆81Updated this week
- Model compression for ONNX☆75Updated this week
- This repository contains integer operators on GPUs for PyTorch.☆184Updated last year
- Cataloging released Triton kernels.☆138Updated 2 months ago
- This repository provides YOLOV5 GPU optimization sample☆100Updated last year
- Applied AI experiments and examples for PyTorch☆168Updated 3 weeks ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆98Updated 2 months ago
- Simple and fast low-bit matmul kernels in CUDA / Triton☆147Updated this week
- llama INT4 cuda inference with AWQ☆48Updated 4 months ago