hummingtree / cuda-graph-with-dynamic-parameters
☆16Updated 2 years ago
Alternatives and similar repositories for cuda-graph-with-dynamic-parameters:
Users that are interested in cuda-graph-with-dynamic-parameters are comparing it to the libraries listed below
- Instructions, Docker images, and examples for Nsight Compute and Nsight Systems☆130Updated 4 years ago
- ☆43Updated 4 years ago
- ☆91Updated 11 months ago
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆129Updated last year
- Experimental projects related to TensorRT☆94Updated this week
- An extension library of WMMA API (Tensor Core API)☆91Updated 8 months ago
- Dissecting NVIDIA GPU Architecture☆90Updated 2 years ago
- Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA☆32Updated 4 years ago
- Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.☆129Updated last week
- 🎃 GPU load-balancing library for regular and irregular computations.☆62Updated 9 months ago
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆31Updated 3 months ago
- Online CUDA Occupancy Calculator☆74Updated 3 years ago
- Conversions to MLIR EmitC☆128Updated 3 months ago
- ☆38Updated 3 years ago
- TPP experimentation on MLIR for linear algebra☆121Updated last week
- Benchmark for measuring the performance of sparse and irregular memory access.☆77Updated last month
- ☆51Updated 5 years ago
- ☆43Updated 4 years ago
- development repository for the open earth compiler☆79Updated 4 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆105Updated 7 years ago
- ☆17Updated 5 years ago
- MLIR-based partitioning system☆73Updated this week
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆99Updated 3 weeks ago
- A hierarchical collective communications library with portable optimizations☆32Updated 3 months ago
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆76Updated last year
- Third party assembler and GEMM library for NVIDIA Kepler GPU☆80Updated 5 years ago
- ☆48Updated 5 years ago
- Assembler for NVIDIA Volta and Turing GPUs☆214Updated 3 years ago
- ☆53Updated 5 years ago