yarkhinephyo / 15-418-parallel-computing-notes
Notes on CMU Parallel Computer Architecture
☆25Updated 2 years ago
Alternatives and similar repositories for 15-418-parallel-computing-notes:
Users that are interested in 15-418-parallel-computing-notes are comparing it to the libraries listed below
- Homework solutions for CMU 10-414/714 – Deep Learning Systems: Algorithms and Implementation☆43Updated 2 years ago
- Stanford CS149 -- Assignment 1☆90Updated 5 months ago
- Code base and slides for ECE408:Applied Parallel Programming On GPU.☆120Updated 3 years ago
- CS149 xmake version☆42Updated last year
- Learning materials for Stanford CS149 : Parallel Computing☆213Updated 3 years ago
- HPC-Lab for High Performance Computing course, 2023 Spring , Tsinghua Universit. 高性能计算导论 @ THU.☆21Updated last year
- A PyTorch-like deep learning framework. Just for fun.☆147Updated last year
- DGEMM on KNL, achieve 75% MKL☆16Updated 2 years ago
- Personal Notes for Learning HPC & Parallel Computation [Active Adding New Content]☆61Updated 2 years ago
- Codes & examples for "CUDA - From Correctness to Performance"☆89Updated 5 months ago
- system paper reading notes☆242Updated 3 years ago
- ☆46Updated last year
- We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …☆179Updated 2 months ago
- This is a cross-chip platform collection of operators and a unified neural network library.☆16Updated last year
- ☆70Updated 2 years ago
- Systems for GenAI☆123Updated 2 weeks ago
- ☆160Updated last year
- DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.☆53Updated 7 months ago
- Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…☆238Updated 3 weeks ago
- hands on model tuning with TVM and profile it on a Mac M1, x86 CPU, and GTX-1080 GPU.☆45Updated last year
- ☆10Updated 2 years ago
- My solutions to the assignments of CMU 10-714 Deep Learning Systems 2022☆36Updated last year
- ☆229Updated last month
- Summary of some awesome work for optimizing LLM inference☆64Updated 2 weeks ago
- Machine Learning Compiler Road Map☆43Updated last year
- PerFlow-AI is a programmable performance analysis, modeling, prediction tool for AI system.☆18Updated last week
- Lab 5 project of MIT-6.5940, deploying LLaMA2-7B-chat on one's laptop with TinyChatEngine.☆16Updated last year
- This repository is established to store personal notes and annotated papers during daily research.☆116Updated this week
- A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores☆50Updated last year
- Xiao's CUDA Optimization Guide [Active Adding New Contents]☆271Updated 2 years ago