morousg / cvGPUSpeedupLinks

A faster implementation of OpenCV-CUDA that uses OpenCV objects, and more!

☆51

Alternatives and similar repositories for cvGPUSpeedup

Users that are interested in cvGPUSpeedup are comparing it to the libraries listed below

Sorting:

onnx / neural-compressor
Model compression for ONNX
☆96Updated 7 months ago
dusty-nv / NanoDB
Zero-copy multimodal vector DB with CUDA and CLIP/SigLIP
☆59Updated 2 months ago
caibucai22 / awesome-cuda
Awesome code, projects, books, etc. related to CUDA
☆19Updated this week
triple-Mu / TensorRT2ONNX
A tool convert TensorRT engine/plan to a fake onnx
☆40Updated 2 years ago
dusty-nv / clip_trt
CLIP and SigLIP models optimized with TensorRT with a Transformers-like API
☆27Updated 9 months ago
NVlabs / EfficientDL
☆33Updated last month
megvii-research / IntLLaMA
IntLLaMA: A fast and light quantization solution for LLaMA
☆18Updated last year
latentCall145 / channels-last-groupnorm
A CUDA kernel for NHWC GroupNorm for PyTorch
☆19Updated 8 months ago
pytorch-labs / tokenizers
C++ implementations for various tokenizers (sentencepiece, tiktoken etc).
☆32Updated this week
MILVLG / mlc-imp
Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
☆10Updated last year
AXERA-TECH / CLIP-ONNX-AX650-CPP
☆27Updated 2 weeks ago
FeiGeChuanShu / trt2023
NVIDIA TensorRT Hackathon 2023复赛选题：通义千问Qwen-7B用TensorRT-LLM模型搭建及优化
☆42Updated last year
triple-Mu / HunyuanDiT-TensorRT-libtorch
HunyuanDiT with TensorRT and libtorch
☆17Updated last year
ahennequ / cuda-tensorcores-register-mapping
☆18Updated 2 years ago
jinmingyi1998 / opencl_kernels
An easy way to run, test, benchmark and tune OpenCL kernel files
☆23Updated last year
Bruce-Lee-LY / decoding_attention
Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.
☆38Updated last month
tile-ai / tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
☆19Updated last week
hisrg / SNPE
Snapdragon Neural Processing Engine (SNPE) SDKThe Snapdragon Neural Processing Engine (SNPE) is a Qualcomm Snapdragon software accelerate…
☆34Updated 3 years ago
GindaChen / FlexFlashAttention3
FlexAttention w/ FlashAttention3 Support
☆26Updated 9 months ago
CVHub520 / efficientvit
EfficientViT is a new family of vision models for efficient high-resolution vision.
☆26Updated last year
habanero-lab / APPy
APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…
☆23Updated 3 weeks ago
tlc-pack / libflash_attn
Standalone Flash Attention v2 kernel without libtorch dependency
☆110Updated 10 months ago
tsingmicro-toolchain / OnnxSlim
A Toolkit to Help Optimize Large Onnx Model
☆157Updated last year
seth-lu / Im2win
☆14Updated 2 years ago
WaveSpeedAI / QuantumAttention
[WIP] Better (FP8) attention for Hopper
☆31Updated 4 months ago
hova88 / CUDA-MatMul-Practice
☆17Updated last year
kyutai-labs / jax-flash-attn3
JAX bindings for the flash-attention3 kernels
☆11Updated 11 months ago
ibaiGorordo / ONNX-YOLO-World-Open-Vocabulary-Object-Detection
Python scripts performing Open Vocabulary Object Detection using the YOLO-World model in ONNX.
☆55Updated last year
TRT2022 / ControlNet_TensorRT
天池 NVIDIA TensorRT Hackathon 2023 —— 生成式AI模型优化赛初赛第三名方案
☆49Updated last year
lucasjinreal / wnnx_models
Various test models in WNNX format. It can view with `pip install wnetron && wnetron`
☆12Updated 3 years ago