ultralytics / thop
Profile PyTorch models for FLOPs and parameters, helping to evaluate computational efficiency and memory usage.
☆30Updated last month
Alternatives and similar repositories for thop:
Users that are interested in thop are comparing it to the libraries listed below
- Easily benchmark PyTorch model FLOPs, latency, throughput, allocated gpu memory and energy consumption☆98Updated last year
- Recent Advances on Efficient Vision Transformers☆49Updated 2 years ago
- [CVPR 2024] PTQ4SAM: Post-Training Quantization for Segment Anything☆62Updated 7 months ago
- ☆136Updated last year
- Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).☆22Updated 8 months ago
- 📚FFPA: Yet another Faster Flash Prefill Attention with O(1)⚡ ️SRAM complexity for headdim > 256, 1.8x~3x↑🎉faster than SDPA EA.☆101Updated this week
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆86Updated this week
- Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and opti…☆47Updated last year
- This repository contains the experimental PyTorch native float8 training UX☆221Updated 6 months ago
- [CVPR-2023] Towards Any Structural Pruning☆16Updated last year
- ☆198Updated 3 years ago
- An algorithm for static activation quantization of LLMs☆115Updated 2 weeks ago
- Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Model…☆57Updated 11 months ago
- VIT inference in triton because, why not?☆23Updated 8 months ago
- Code for the NeurIPS 2022 paper "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning".☆109Updated last year
- A simple program to calculate and visualize the FLOPs and Parameters of Pytorch models, with handy CLI and easy-to-use Python API.☆125Updated 2 months ago
- ☆68Updated this week
- [ECCV 2024] Isomorphic Pruning for Vision Models☆65Updated 6 months ago
- A repository dedicated to evaluating the performance of quantizied LLaMA3 using various quantization methods..☆177Updated last month
- ☆177Updated this week
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆116Updated 11 months ago
- [NeurIPS 2023] MCUFormer: Deploying Vision Transformers on Microcontrollers with Limited Memory☆65Updated last year
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆57Updated this week
- Timm model explorer☆37Updated 10 months ago
- [ICML 2022] "DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware Efficiency of Compact Neural Networks", by Yonggan …☆71Updated 2 years ago
- GPU operators for sparse tensor operations☆30Updated 11 months ago
- Fast Hadamard transform in CUDA, with a PyTorch interface☆141Updated 8 months ago
- ☆157Updated last year
- The official implementation of PTQD: Accurate Post-Training Quantization for Diffusion Models☆94Updated 11 months ago
- IntLLaMA: A fast and light quantization solution for LLaMA☆18Updated last year