morousg / cvGPUSpeedup
A faster implementation of OpenCV-CUDA that uses OpenCV objects, and more!
☆51Updated this week
Alternatives and similar repositories for cvGPUSpeedup
Users that are interested in cvGPUSpeedup are comparing it to the libraries listed below
Sorting:
- A tool convert TensorRT engine/plan to a fake onnx☆39Updated 2 years ago
- Zero-copy multimodal vector DB with CUDA and CLIP/SigLIP☆55Updated last week
- Awesome code, projects, books, etc. related to CUDA☆16Updated 3 weeks ago
- HunyuanDiT with TensorRT and libtorch☆17Updated 11 months ago
- Model compression for ONNX☆92Updated 5 months ago
- Simple tool for partial optimization of ONNX. Further optimize some models that cannot be optimized with onnx-optimizer and onnxsim by se…☆19Updated last year
- CLIP and SigLIP models optimized with TensorRT with a Transformers-like API☆23Updated 7 months ago
- ☆18Updated 2 years ago
- A CUDA kernel for NHWC GroupNorm for PyTorch☆18Updated 5 months ago
- An easy way to run, test, benchmark and tune OpenCL kernel files☆23Updated last year
- ☆31Updated 10 months ago
- study of cutlass☆21Updated 6 months ago
- ONNX Command-Line Toolbox☆35Updated 7 months ago
- Python scripts performing optical flow estimation using the NeuFlowV2 model in ONNX.☆46Updated 7 months ago
- [CVPR-2023] Towards Any Structural Pruning☆16Updated 2 years ago
- 天池 NVIDIA TensorRT Hackathon 2023 —— 生成式AI模型优化赛 初赛第三名方案☆49Updated last year
- Nsight Systems In Docker☆20Updated last year
- [WIP] Better (FP8) attention for Hopper☆30Updated 2 months ago
- Hacks for PyTorch☆19Updated 2 years ago
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆18Updated last week
- IntLLaMA: A fast and light quantization solution for LLaMA☆18Updated last year
- Repo for event-based binary image reconstruction.☆32Updated last year
- A simple Python tool to measure the performance of ONNX models.☆26Updated 7 months ago
- Python scripts performing Open Vocabulary Object Detection using the YOLO-World model in ONNX.☆53Updated last year
- Standalone Flash Attention v2 kernel without libtorch dependency☆108Updated 8 months ago
- ONNX model visualizer☆87Updated last year
- ONNX-compatible LightGlue: Local Feature Matching at Light Speed☆22Updated last year
- Stable Diffusion in TensorRT 8.5+☆14Updated 2 years ago
- C++ implementations for various tokenizers (sentencepiece, tiktoken etc).☆22Updated this week
- ☆16Updated last year