manishucsd / py-codegen
☆14Updated last month
Related projects ⓘ
Alternatives and complementary repositories for py-codegen
- ☆48Updated 8 months ago
- GEMM and Winograd based convolutions using CUTLASS☆25Updated 4 years ago
- extensible collectives library in triton☆72Updated last month
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆63Updated 2 years ago
- cuASR: CUDA Algebra for Semirings☆34Updated 2 years ago
- Memory Optimizations for Deep Learning (ICML 2023)☆60Updated 8 months ago
- ☆11Updated 3 years ago
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆90Updated 4 months ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆35Updated 6 months ago
- ☆169Updated 4 months ago
- ☆80Updated 7 months ago
- ☆45Updated 2 weeks ago
- ☆48Updated this week
- An extension library of WMMA API (Tensor Core API)☆84Updated 4 months ago
- Benchmarks to capture important workloads.☆28Updated 5 months ago
- FlexAttention w/ FlashAttention3 Support☆27Updated last month
- A lightweight, Pythonic, frontend for MLIR☆80Updated last year
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆46Updated 2 months ago
- A library of GPU kernels for sparse matrix operations.☆249Updated 3 years ago
- ☆12Updated last month
- A Winograd Minimal Filter Implementation in CUDA☆23Updated 3 years ago
- A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores☆43Updated 11 months ago
- Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.☆11Updated last year
- ☆38Updated 4 years ago
- CUDA templates for tile-sparse matrix multiplication based on CUTLASS.☆49Updated 6 years ago
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆30Updated 3 months ago
- 🎃 GPU load-balancing library for regular and irregular computations.☆57Updated 5 months ago
- ☆55Updated 5 months ago
- Sparsity support for PyTorch☆31Updated this week
- Customized matrix multiplication kernels☆53Updated 2 years ago