morousg / cvGPUSpeedupLinks
A faster implementation of OpenCV-CUDA that uses OpenCV objects, and more!
☆51Updated last week
Alternatives and similar repositories for cvGPUSpeedup
Users that are interested in cvGPUSpeedup are comparing it to the libraries listed below
Sorting:
- A tool convert TensorRT engine/plan to a fake onnx☆39Updated 2 years ago
- Model compression for ONNX☆96Updated 6 months ago
- Awesome code, projects, books, etc. related to CUDA☆17Updated last month
- Zero-copy multimodal vector DB with CUDA and CLIP/SigLIP☆56Updated 3 weeks ago
- CLIP and SigLIP models optimized with TensorRT with a Transformers-like API☆25Updated 8 months ago
- HunyuanDiT with TensorRT and libtorch☆17Updated last year
- An easy way to run, test, benchmark and tune OpenCL kernel files☆23Updated last year
- ☆18Updated 2 years ago
- A nvImageCodec library of GPU- and CPU- accelerated codecs featuring a unified interface☆104Updated 2 months ago
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆18Updated last week
- [WIP] Better (FP8) attention for Hopper☆30Updated 3 months ago
- A CUDA kernel for NHWC GroupNorm for PyTorch☆18Updated 6 months ago
- AI-related samples made available by the DevTech ProViz team☆30Updated last year
- ☆31Updated 11 months ago
- Nsight Systems In Docker☆20Updated last year
- NVIDIA TensorRT Hackathon 2023复赛选题:通义千问Qwen-7B用TensorRT-LLM模型搭建及优化☆42Updated last year
- ONNX Command-Line Toolbox☆35Updated 7 months ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆109Updated 8 months ago
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆23Updated 2 weeks ago
- ☆29Updated 4 months ago
- Generalist YOLO: Towards Real-Time End-to-End Multi-Task Visual Language Models☆74Updated last month
- A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB,…☆17Updated last year
- ☆24Updated 2 years ago
- Simple tool for partial optimization of ONNX. Further optimize some models that cannot be optimized with onnx-optimizer and onnxsim by se…☆19Updated last year
- Timm model explorer☆39Updated last year
- Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.☆36Updated 2 months ago
- Python scripts performing optical flow estimation using the NeuFlowV2 model in ONNX.☆47Updated 8 months ago
- TensorRT Acceleration for PyTorch Native Eager Mode Quantization Models☆15Updated 10 months ago
- New operators for the ReferenceEvaluator, new kernels for onnxruntime, CPU, CUDA☆32Updated 2 months ago
- ☆70Updated 2 weeks ago