ConsciousML / img-processing-cudaLinks

Implementation from scratch in CUDA C++ of image processing algorithms.

☆14

Alternatives and similar repositories for img-processing-cuda

Users that are interested in img-processing-cuda are comparing it to the libraries listed below

Sorting:

KhosroBahrami / ImageFiltering_CUDA
Image Filtering using CUDA
☆27Updated 6 years ago
jundaf2 / CUDA-INT8-GEMM
CUDA 8-bit Tensor Core Matrix Multiplication based on m16n16k16 WMMA API
☆30Updated last year
hova88 / CUDA-MatMul-Practice
☆16Updated last year
emptysoal / cuda-image-preprocess
Speed up image preprocess with cuda when handle image or tensorrt inference
☆68Updated 3 weeks ago
caiwanxianhust / FasterLLaMA
使用 CUDA C++ 实现的 llama 模型推理框架
☆57Updated 6 months ago
ThoenigAdrian / NeuralNetworksCudaTutorial
Implement Neural Networks in Cuda from Scratch
☆23Updated last year
ozanarmagan / clip_tokenizer_cpp
☆10Updated 10 months ago
leimao / TensorRT-Custom-Plugin-Example
Quick and Self-Contained TensorRT Custom Plugin Implementation and Integration
☆60Updated last week
zpye / SimpleInfer
A simple neural network inference framework
☆25Updated last year
TRT2022 / ControlNet_TensorRT
天池 NVIDIA TensorRT Hackathon 2023 —— 生成式AI模型优化赛初赛第三名方案
☆49Updated last year
wangzyon / trt_learn
TensorRT encapsulation, learn, rewrite, practice.
☆28Updated 2 years ago
raymond1123 / hgemm
☆29Updated 6 months ago
inisis / OnnxLLM
Large Language Model Onnx Inference Framework
☆35Updated 4 months ago
NVIDIA-AI-IOT / NVIDIA-Optical-Character-Detection-and-Recognition-Solution
This repository provides optical character detection and recognition solution optimized on Nvidia devices.
☆75Updated 3 weeks ago
Bruce-Lee-LY / decoding_attention
Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.
☆36Updated 2 months ago
leimao / CUDA-GEMM-Optimization
CUDA Matrix Multiplication Optimization
☆189Updated 10 months ago
yester31 / Cutlass_EX
study of cutlass
☆21Updated 6 months ago
cvdong / TRT_PRO_LEARN
对 tensorRT_Pro 开源项目理解
☆21Updated 2 years ago
zeroine / cutlass-cute-sample
☆33Updated last year
JieRen98 / SGEMM-SASS-Annotation
☆21Updated 4 years ago
CisMine / Guide-NVIDIA-Tools
NVIDIA tools guide
☆133Updated 5 months ago
Bruce-Lee-LY / cuda_hgemv
Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
☆62Updated 8 months ago
OscarSavolainen / Quantization-Tutorials
A bunch of coding tutorials for my Youtube videos on Neural Network Quantization.
☆16Updated last year
olibartfast / computer-vision-triton-cpp-client
C++ application to perform computer vision tasks using Nvidia Triton Server for model inference
☆23Updated last month
caijixueIT / CUDA_Learning_for_Freshman
☆11Updated 3 months ago
Phoenix8215 / build_neural_network_from_scratch_CPP
Created a simple neural network using C++17 standard and the Eigen library that supports both forward and backward propagation.
☆9Updated 10 months ago
piDack / The-ans-for-Programming-Massively-Parallel-Processor
大规模并行处理器编程实战第二版答案
☆32Updated 3 years ago
iclementine / optimize_softmax
Optimize softmax in triton in many cases
☆21Updated 9 months ago
kilianhae / FlashAttention.C
Flash Attention in raw Cuda C beating PyTorch
☆22Updated last year
zjd1988 / rknn_backend
☆17Updated last year