stanford-cs149 / asst3
Stanford CS149 -- Assignment 3
☆15Updated this week
Related projects ⓘ
Alternatives and complementary repositories for asst3
- Stanford CS149 -- Assignment 1☆57Updated last month
- Stanford CS149 -- Assignment 2☆8Updated 3 weeks ago
- Introduction to CUDA programming and debugging☆10Updated 2 years ago
- ASPLOS'24: Optimal Kernel Orchestration for Tensor Programs with Korch☆29Updated 3 months ago
- TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.☆148Updated this week
- IMPACT GPU Algorithms Teaching Labs☆55Updated last year
- PyTorch-Based Fast and Efficient Processing for Various Machine Learning Applications with Diverse Sparsity☆97Updated this week
- An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.☆49Updated 3 months ago
- Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…☆41Updated 3 years ago
- A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores☆35Updated 11 months ago
- DietCode Code Release☆61Updated 2 years ago
- Code base and slides for ECE408:Applied Parallel Programming On GPU.☆117Updated 3 years ago
- study of Ampere' Sparse Matmul☆14Updated 3 years ago
- ☆89Updated 2 years ago
- ☆162Updated 3 months ago
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆114Updated 4 years ago
- Dissecting NVIDIA GPU Architecture☆82Updated 2 years ago
- SparseTIR: Sparse Tensor Compiler for Deep Learning☆131Updated last year
- Dynamic Tensor Rematerialization prototype (modified PyTorch) and simulator. Paper: https://arxiv.org/abs/2006.09616☆129Updated last year
- An implementation of HPL-AI Mixed-Precision Benchmark based on hpl-2.3☆27Updated 3 years ago
- The code for our paper "Neural Architecture Search as Program Transformation Exploration"☆18Updated 3 years ago
- ☆78Updated 6 months ago
- ☆41Updated 6 months ago
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS☆17Updated 2 years ago
- Examples of CUDA implementations by Cutlass CuTe☆82Updated last week
- Personal Notes for Learning HPC & Parallel Computation [Active Adding New Content]☆59Updated 2 years ago
- CUDA Matrix Multiplication Optimization☆139Updated 3 months ago
- ngAP's artifact for ASPLOS'24☆17Updated last week
- play gemm with tvm☆84Updated last year
- MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)☆43Updated 5 months ago