morousg / cvGPUSpeedup
A faster implementation of OpenCV-CUDA that uses OpenCV objects, and more!
☆50Updated this week
Alternatives and similar repositories for cvGPUSpeedup:
Users that are interested in cvGPUSpeedup are comparing it to the libraries listed below
- A tool convert TensorRT engine/plan to a fake onnx☆38Updated 2 years ago
- Model compression for ONNX☆87Updated 4 months ago
- HunyuanDiT with TensorRT and libtorch☆17Updated 10 months ago
- CLIP and SigLIP models optimized with TensorRT with a Transformers-like API☆22Updated 5 months ago
- A simple Python tool to measure the performance of ONNX models.☆26Updated 6 months ago
- ☆18Updated 2 years ago
- Python scripts performing optical flow estimation using the NeuFlowV2 model in ONNX.☆41Updated 6 months ago
- A CUDA kernel for NHWC GroupNorm for PyTorch☆18Updated 4 months ago
- Docker scripts for building ONNX Runtime with TensorRT and OpenVINO in manylinux environment☆22Updated last year
- Zero-copy multimodal vector DB with CUDA and CLIP/SigLIP☆49Updated 9 months ago
- An easy way to run, test, benchmark and tune OpenCL kernel files☆23Updated last year
- Stable Diffusion in TensorRT 8.5+☆14Updated 2 years ago
- ONNX Command-Line Toolbox☆35Updated 5 months ago
- EfficientViT is a new family of vision models for efficient high-resolution vision.☆24Updated last year
- Simple tool for partial optimization of ONNX. Further optimize some models that cannot be optimized with onnx-optimizer and onnxsim by se…☆19Updated 10 months ago
- A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB,…☆16Updated 10 months ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆106Updated 6 months ago
- New operators for the ReferenceEvaluator, new kernels for onnxruntime, CPU, CUDA☆32Updated this week
- ☆32Updated last year
- ☆31Updated 9 months ago
- NVIDIA TensorRT Hackathon 2023复赛选题:通义千问Qwen-7B用TensorRT-LLM模型搭建及优化☆41Updated last year
- Python scripts performing Open Vocabulary Object Detection using the YOLO-World model in ONNX.☆49Updated 11 months ago
- FlexAttention w/ FlashAttention3 Support☆26Updated 5 months ago
- Python scripts for performing monocular depth estimation using the SC_Depth model in ONNX☆31Updated 2 years ago
- A nvImageCodec library of GPU- and CPU- accelerated codecs featuring a unified interface☆92Updated this week
- IntLLaMA: A fast and light quantization solution for LLaMA☆18Updated last year
- Nsight Systems In Docker☆20Updated last year