NVIDIA / HMM_sample_codeLinks

CUDA 12.2 HMM demos

☆20

Alternatives and similar repositories for HMM_sample_code

Users that are interested in HMM_sample_code are comparing it to the libraries listed below

Sorting:

microsoft / TileFusion
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
☆98Updated 3 months ago
LeiWang1999 / AutoGPTQ.tvm
GPTQ inference TVM kernel
☆39Updated last year
ademeure / cuda-side-boost
☆45Updated 5 months ago
cassiewilliam / cuda_op_benchmark
方便扩展的Cuda算子理解和优化框架，仅用在学习使用
☆17Updated last year
Oneflow-Inc / dfccl
☆27Updated 8 months ago
triton-lang / kernels
☆92Updated 11 months ago
wmmae / wmma_extension
An extension library of WMMA API (Tensor Core API)
☆106Updated last year
eth-cscs / Tiled-MM
Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.
☆32Updated 6 months ago
tlc-pack / libflash_attn
Standalone Flash Attention v2 kernel without libtorch dependency
☆112Updated last year
TiledTensor / TiledBench
Benchmark tests supporting the TiledCUDA library.
☆17Updated 11 months ago
Lin-Mao / DrGPUM
A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.
☆26Updated last year
IBM / triton-dejavu
Framework to reduce autotune overhead to zero for well known deployments.
☆84Updated last month
Azure / msccl-executor-nccl
☆46Updated 10 months ago
jiazhihao / attention_superoptimizer
An Attention Superoptimizer
☆22Updated 9 months ago
TiledTensor / TiledCUDA
We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …
☆186Updated 8 months ago
iree-org / iree-nvgpu
☆50Updated last year
muriloboratto / NVSHEMEM
Sample Codes using NVSHMEM on Multi-GPU
☆30Updated 2 years ago
exists-forall / striped_attention
☆41Updated last year
HanGuo97 / hilt
☆33Updated last week
ColfaxResearch / layout-categories
This repository contains companion software for the Colfax Research paper "Categorical Foundations for CuTe Layouts".
☆67Updated 3 weeks ago
awslabs / ratex
☆23Updated last month
zhuohan123 / terapipe
☆75Updated 4 years ago
manishucsd / py-codegen
☆16Updated last year
KuangjuX / NVSHMEM-Tutorial
NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer
☆138Updated last month
nox-410 / tvm.tl
An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.
☆51Updated last year
facebookexperimental / triton
Github mirror of trition-lang/triton repo.
☆86Updated this week
hpcaitech / TensorNVMe
A Python library transfers PyTorch tensors between CPU and NVMe
☆120Updated 10 months ago
apuaaChen / EVT_AE
Artifacts of EVT ASPLOS'24
☆26Updated last year
chips-compilers-mlsys-21 / chips-compilers-mlsys-21.github.io
☆11Updated 4 years ago
flashinfer-ai / cutlass-viz
☆65Updated 5 months ago