dlsyscourse / hw0
☆21Updated 4 months ago
Related projects: ⓘ
- Tutorials for writing high-performance GPU operators in AI frameworks.☆118Updated last year
- Imperative deep learning framework with customized GPU and CPU backend☆28Updated last year
- ☆6Updated last week
- Cataloging released Triton kernels.☆111Updated 3 weeks ago
- flash attention tutorial written in python, triton, cuda, cutlass☆159Updated 3 months ago
- ☆188Updated last week
- ☆134Updated last year
- Collection of kernels written in Triton language☆48Updated 2 weeks ago
- Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…☆33Updated 3 years ago
- Stanford CS149 -- Assignment 1☆35Updated 11 months ago
- ☆48Updated 9 months ago
- ☆18Updated last week
- Memory Optimizations for Deep Learning (ICML 2023)☆58Updated 6 months ago
- A PyTorch-like deep learning framework. Just for fun.☆128Updated 11 months ago
- Machine Learning Compiler Road Map☆40Updated last year
- Applied AI experiments and examples for PyTorch☆123Updated last month
- Penn CIS 5650 (GPU Programming and Architecture) Final Project☆21Updated 9 months ago
- Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…☆153Updated this week
- Code base and slides for ECE408:Applied Parallel Programming On GPU.☆113Updated 3 years ago
- TensorRT LLM Benchmark Configuration☆10Updated last month
- ☆67Updated last week
- CUDA Matrix Multiplication Optimization☆118Updated 2 months ago
- ☆83Updated 3 weeks ago
- ☆20Updated last year
- TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.☆114Updated last week
- deep learning framework from scratch☆24Updated 2 years ago
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆20Updated last week
- ☆124Updated last week
- A Easy-to-understand TensorOp Matmul Tutorial☆265Updated this week
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity☆166Updated 11 months ago