microsoft / cusync
☆20Updated 8 months ago
Related projects ⓘ
Alternatives and complementary repositories for cusync
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆22Updated last month
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19Updated 6 months ago
- Triton to TVM transpiler.☆16Updated last month
- ☆29Updated 2 years ago
- HeteroCL-MLIR dialect for accelerator design☆40Updated last month
- ThrillerFlow is a Dataflow Analysis and Codegen Framework written in Rust.☆10Updated last month
- PTX-EMU is a simple emulator for CUDA program.☆24Updated 10 months ago
- A framework that support executing unmodified CUDA source code on non-NVIDIA devices.☆103Updated 3 months ago
- A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches☆13Updated 5 years ago
- PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo☆19Updated last year
- GPU Performance Advisor☆63Updated 2 years ago
- An Attention Superoptimizer☆20Updated 6 months ago
- A GPU FP32 computation method with Tensor Cores.☆18Updated last year
- A novel spatial accelerator for horizontal diffusion weather stencil computation, as described in ICS 2023 paper by Singh et al. (https:/…☆19Updated last year
- Optimize tensor program fast with Felix, a gradient descent autotuner.☆19Updated 6 months ago
- ☆18Updated last month
- TileFlow is a performance analysis tool based on Timeloop for fusion dataflows☆55Updated 7 months ago
- Microsoft Collective Communication Library☆52Updated last month
- ☆32Updated 2 years ago
- Data-Centric MLIR dialect☆38Updated last year
- ☆38Updated 4 years ago
- ☆47Updated 5 years ago
- ASPLOS'24: Optimal Kernel Orchestration for Tensor Programs with Korch☆29Updated 3 months ago
- ☆40Updated 3 years ago
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS☆17Updated 2 years ago
- An IR for efficiently simulating distributed ML computation.☆25Updated 10 months ago
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆65Updated 10 months ago
- Experiments and prototypes associated with IREE or MLIR☆49Updated 3 months ago
- An extension library of WMMA API (Tensor Core API)☆83Updated 4 months ago