NVIDIA / TensorRT-IncubatorLinks

Experimental projects related to TensorRT

☆108

Alternatives and similar repositories for TensorRT-Incubator

Users that are interested in TensorRT-Incubator are comparing it to the libraries listed below

Sorting:

microsoft / triton-shared
Shared Middle-Layer for Triton Compilation
☆260Updated this week
ColfaxResearch / cutlass-kernels
☆227Updated last year
ColfaxResearch / cfx-article-src
☆127Updated 2 months ago
RRZE-HPC / gpu-benches
collection of benchmarks to measure basic GPU capabilities
☆401Updated 5 months ago
Cambricon / triton-linalg
Development repository for the Triton-Linalg conversion
☆190Updated 5 months ago
Bruce-Lee-LY / cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…
☆450Updated 10 months ago
KnowingNothing / MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
☆369Updated 10 months ago
ROCm / composable_kernel
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
☆444Updated this week
yifuwang / symm-mem-recipes
☆102Updated 7 months ago
daadaada / turingas
Assembler for NVIDIA Volta and Turing GPUs
☆226Updated 3 years ago
leimao / CUDA-GEMM-Optimization
CUDA Matrix Multiplication Optimization
☆213Updated last year
pranjalssh / fast.cu
Fastest kernels written from scratch
☆308Updated 4 months ago
NVIDIA / Fuser
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
☆345Updated this week
Yinghan-Li / YHs_Sample
Yinghan's Code Sample
☆340Updated 3 years ago
tlc-pack / relax
☆196Updated 2 years ago
wmmae / wmma_extension
An extension library of WMMA API (Tensor Core API)
☆99Updated last year
triton-lang / triton-cpu
An experimental CPU backend for Triton
☆138Updated 2 months ago
wangzyon / NVIDIA_SGEMM_PRACTICE
Step-by-step optimization of CUDA SGEMM
☆362Updated 3 years ago
NVIDIA / online-softmax
Benchmark code for the "Online normalizer calculation for softmax" paper
☆95Updated 7 years ago
DD-DuDa / Cute-Learning
Examples of CUDA implementations by Cutlass CuTe
☆214Updated last month
microsoft / microxcaling
PyTorch emulation library for Microscaling (MX)-compatible data formats
☆262Updated last month
sunlex0717 / DissectingTensorCores
☆106Updated last year
leimao / CUTLASS-Examples
CUTLASS and CuTe Examples
☆65Updated 2 weeks ago
reed-lau / cute-gemm
☆128Updated 7 months ago
intel / intel-xpu-backend-for-triton
OpenAI Triton backend for Intel® GPUs
☆197Updated this week
yzhaiustc / Optimizing-SGEMM-on-NVIDIA-Turing-GPUs
Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.
☆370Updated 7 months ago
Cjkkkk / CUDA_gemm
A simple high performance CUDA GEMM implementation.
☆392Updated last year
tensorflow / mlir-hlo
☆420Updated this week
OpenPPL / ppl.llm.kernel.cuda
☆149Updated 6 months ago
TiledTensor / TiledCUDA
We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …
☆183Updated 6 months ago