aschuh703 / ECE408Links
☆46Updated last year
Alternatives and similar repositories for ECE408
Users that are interested in ECE408 are comparing it to the libraries listed below
Sorting:
- Solution of Programming Massively Parallel Processors☆47Updated last year
- DGEMM on KNL, achieve 75% MKL☆18Updated 3 years ago
- Code base and slides for ECE408:Applied Parallel Programming On GPU.☆124Updated 3 years ago
- Homework solutions for CMU 10-414/714 – Deep Learning Systems: Algorithms and Implementation☆43Updated 2 years ago
- A PyTorch-like deep learning framework. Just for fun.☆154Updated last year
- ☆108Updated last week
- Personal Notes for Learning HPC & Parallel Computation [Active Adding New Content]☆67Updated 2 years ago
- A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores☆51Updated last year
- Codes & examples for "CUDA - From Correctness to Performance"☆98Updated 7 months ago
- ☆142Updated 5 months ago
- ☆31Updated 11 months ago
- ☆27Updated last year
- ☆121Updated 5 months ago
- Examples of CUDA implementations by Cutlass CuTe☆188Updated 4 months ago
- Some source code about matrix multiplication implementation on CUDA☆34Updated 6 years ago
- Xiao's CUDA Optimization Guide [Active Adding New Contents]☆298Updated 2 years ago
- ☆112Updated last year
- A Easy-to-understand TensorOp Matmul Tutorial☆359Updated 8 months ago
- ☆64Updated 4 months ago
- Learning materials for Stanford CS149 : Parallel Computing☆224Updated 3 years ago
- ☆38Updated 10 months ago
- LLM serving cluster simulator☆102Updated last year
- ☆142Updated 11 months ago
- Lab 5 project of MIT-6.5940, deploying LLaMA2-7B-chat on one's laptop with TinyChatEngine.☆16Updated last year
- A baseline repository of Auto-Parallelism in Training Neural Networks☆142Updated 2 years ago
- Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.☆88Updated 2 years ago
- GEMM by WMMA (tensor core)☆12Updated 2 years ago
- ☆51Updated 5 years ago
- Hands-On Practical MLIR Tutorial☆25Updated 10 months ago
- [EuroSys'25] Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-Optimization☆14Updated last month