NVIDIA / HMM_sample_codeLinks
CUDA 12.2 HMM demos
☆20Updated last year
Alternatives and similar repositories for HMM_sample_code
Users that are interested in HMM_sample_code are comparing it to the libraries listed below
Sorting:
- 方便扩展的Cuda算子理解和优化框架,仅用在学习使用☆17Updated last year
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆97Updated 3 months ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆25Updated 11 months ago
- Benchmark tests supporting the TiledCUDA library.☆17Updated 10 months ago
- ☆42Updated 4 months ago
- (NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.☆40Updated 2 years ago
- Framework to reduce autotune overhead to zero for well known deployments.☆82Updated last week
- GPTQ inference TVM kernel☆40Updated last year
- An Attention Superoptimizer☆22Updated 8 months ago
- An extension library of WMMA API (Tensor Core API)☆105Updated last year
- ☆90Updated 10 months ago
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆32Updated 5 months ago
- ☆27Updated 7 months ago
- RCCL Performance Benchmark Tests☆76Updated last week
- GPU Performance Advisor☆66Updated 3 years ago
- ☆46Updated 9 months ago
- ☆31Updated this week
- Bandwidth test for ROCm☆66Updated this week
- ☆25Updated last year
- ☆50Updated last year
- An experimental communicating attention kernel based on DeepEP.☆34Updated last month
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆41Updated last year
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.☆117Updated this week
- Standalone Flash Attention v2 kernel without libtorch dependency☆110Updated last year
- ☆23Updated last month
- This repository contains companion software for the Colfax Research paper "Categorical Foundations for CuTe Layouts".☆29Updated this week
- A practical way of learning Swizzle☆28Updated 7 months ago
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19Updated last year
- ☆57Updated 8 months ago
- ☆75Updated 4 years ago