NVIDIA / cudnn-frontendLinks

cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it

☆596

Alternatives and similar repositories for cudnn-frontend

Users that are interested in cudnn-frontend are comparing it to the libraries listed below

Sorting:

ROCm / composable_kernel
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
☆444Updated this week
NVIDIA / NVTX
The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…
☆427Updated last week
NVIDIA / Fuser
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
☆345Updated this week
NVIDIA / nvbench
CUDA Kernel Benchmarking Library
☆691Updated last week
NVIDIA / nsight-training
Training material for Nsight developer tools
☆162Updated 11 months ago
intel / intel-xpu-backend-for-triton
OpenAI Triton backend for Intel® GPUs
☆196Updated this week
wangzyon / NVIDIA_SGEMM_PRACTICE
Step-by-step optimization of CUDA SGEMM
☆362Updated 3 years ago
cloudcores / CuAssembler
An unofficial cuda assembler, for all generations of SASS, hopefully ：）
☆523Updated 2 years ago
ROCm / AMDMIGraphX
AMD's graph optimization engine.
☆234Updated this week
NVIDIA / multi-gpu-programming-models
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
☆763Updated 5 months ago
Bruce-Lee-LY / cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…
☆445Updated 10 months ago
leimao / CUDA-GEMM-Optimization
CUDA Matrix Multiplication Optimization
☆211Updated last year
onnx / onnx-mlir
Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure
☆887Updated this week
NVIDIA / nvbandwidth
A tool for bandwidth measurements on NVIDIA GPUs.
☆492Updated 3 months ago
NVIDIA / TensorRT-Model-Optimizer
A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. …
☆1,078Updated 2 weeks ago
ROCm / rccl
ROCm Communication Collectives Library (RCCL)
☆352Updated last week
RRZE-HPC / gpu-benches
collection of benchmarks to measure basic GPU capabilities
☆398Updated 5 months ago
NVIDIA / TensorRT-Incubator
Experimental projects related to TensorRT
☆108Updated this week
ROCm / HIPIFY
HIPIFY: Convert CUDA to Portable C++ Code
☆604Updated this week
Cjkkkk / CUDA_gemm
A simple high performance CUDA GEMM implementation.
☆392Updated last year
microsoft / triton-shared
Shared Middle-Layer for Triton Compilation
☆260Updated this week
yzhaiustc / Optimizing-SGEMM-on-NVIDIA-Turing-GPUs
Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.
☆369Updated 7 months ago
Yinghan-Li / YHs_Sample
Yinghan's Code Sample
☆340Updated 3 years ago
ROCm / aiter
AI Tensor Engine for ROCm
☆240Updated this week
tensorflow / mlir-hlo
☆420Updated this week
KnowingNothing / MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
☆368Updated 10 months ago
pytorch / FBGEMM
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
☆1,415Updated this week
daadaada / turingas
Assembler for NVIDIA Volta and Turing GPUs
☆226Updated 3 years ago
ROCm / triton
Development repository for the Triton language and compiler
☆126Updated this week
siboehm / SGEMM_CUDA
Fast CUDA matrix multiplication from scratch
☆782Updated last year