morousg / cvGPUSpeedup
A faster implementation of OpenCV-CUDA that uses OpenCV objects, and more!
☆49Updated this week
Alternatives and similar repositories for cvGPUSpeedup:
Users that are interested in cvGPUSpeedup are comparing it to the libraries listed below
- A tool convert TensorRT engine/plan to a fake onnx☆38Updated 2 years ago
- Model compression for ONNX☆91Updated 5 months ago
- CLIP and SigLIP models optimized with TensorRT with a Transformers-like API☆23Updated 6 months ago
- A CUDA kernel for NHWC GroupNorm for PyTorch☆18Updated 5 months ago
- Awesome code, projects, books, etc. related to CUDA☆16Updated last week
- Simple tool for partial optimization of ONNX. Further optimize some models that cannot be optimized with onnx-optimizer and onnxsim by se…☆19Updated 11 months ago
- Nsight Systems In Docker☆20Updated last year
- HunyuanDiT with TensorRT and libtorch☆17Updated 10 months ago
- [WIP] Better (FP8) attention for Hopper☆27Updated last month
- Zero-copy multimodal vector DB with CUDA and CLIP/SigLIP☆54Updated 10 months ago
- [CVPR-2023] Towards Any Structural Pruning☆16Updated last year
- 天池 NVIDIA TensorRT Hackathon 2023 —— 生成式AI模型优化赛 初赛第三名方案☆49Updated last year
- A simple Python tool to measure the performance of ONNX models.☆26Updated 7 months ago
- A nvImageCodec library of GPU- and CPU- accelerated codecs featuring a unified interface☆97Updated last month
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆18Updated last week
- EfficientViT is a new family of vision models for efficient high-resolution vision.☆24Updated last year
- Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.☆36Updated 2 weeks ago
- Python scripts performing optical flow estimation using the NeuFlowV2 model in ONNX.☆41Updated 7 months ago
- Stable Diffusion in TensorRT 8.5+☆14Updated 2 years ago
- ☆31Updated 10 months ago
- IntLLaMA: A fast and light quantization solution for LLaMA☆18Updated last year
- Snapdragon Neural Processing Engine (SNPE) SDKThe Snapdragon Neural Processing Engine (SNPE) is a Qualcomm Snapdragon software accelerate…☆34Updated 3 years ago
- Python scripts performing Open Vocabulary Object Detection using the YOLO-World model in ONNX.☆51Updated last year
- ONNX Command-Line Toolbox☆35Updated 6 months ago
- Simple tool to change the INPUT and OUTPUT shape of ONNX.☆15Updated 2 weeks ago
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆53Updated last week
- Memory-Efficient CUDA kernels for training ConvNets with PyTorch.☆40Updated last month
- SealAI's stable diffusion implementation☆75Updated 3 months ago
- An easy way to run, test, benchmark and tune OpenCL kernel files☆23Updated last year
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆62Updated this week