manishucsd / py-codegen
☆15Updated 4 months ago
Alternatives and similar repositories for py-codegen:
Users that are interested in py-codegen are comparing it to the libraries listed below
- ☆48Updated 10 months ago
- extensible collectives library in triton☆77Updated 4 months ago
- GEMM and Winograd based convolutions using CUTLASS☆26Updated 4 years ago
- ☆36Updated last month
- MLIR-based partitioning system☆58Updated this week
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆94Updated 6 months ago
- An extension library of WMMA API (Tensor Core API)☆87Updated 6 months ago
- A lightweight, Pythonic, frontend for MLIR☆80Updated last year
- ☆64Updated 2 months ago
- ☆67Updated last month
- ☆180Updated 6 months ago
- ☆84Updated 9 months ago
- TileFusion is a highly efficient kernel template library designed to elevate the level of abstraction in CUDA C for processing tiles.☆43Updated this week
- Experiments and prototypes associated with IREE or MLIR☆51Updated 5 months ago
- A library of GPU kernels for sparse matrix operations.☆252Updated 4 years ago
- Ahead of Time (AOT) Triton Math Library☆50Updated this week
- A sandbox for quick iteration and experimentation on projects related to IREE, MLIR, and LLVM☆56Updated 4 months ago
- Memory Optimizations for Deep Learning (ICML 2023)☆62Updated 10 months ago
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆127Updated last year
- Learning about CUDA by writing PTX code.☆33Updated 11 months ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆38Updated 8 months ago
- ☆49Updated 5 months ago
- Fastest kernels written from scratch☆131Updated 2 months ago
- GPU Performance Advisor☆63Updated 2 years ago
- Dissecting NVIDIA GPU Architecture☆84Updated 2 years ago
- cuASR: CUDA Algebra for Semirings☆35Updated 2 years ago
- SGEMM that beats cuBLAS☆68Updated last week
- CUDA templates for tile-sparse matrix multiplication based on CUTLASS.☆49Updated 6 years ago
- SparseTIR: Sparse Tensor Compiler for Deep Learning☆133Updated last year