simonaertssen / MIT-6.172-Performance-Engineering-of-Software-SystemsLinks
6.172 is an 18-unit class that provides a hands-on, project-based introduction to building scalable and high-performance software systems. Topics include performance analysis, algorithmic techniques for high performance, instruction-level optimizations, caching optimizations, parallel programming, and building scalable systems. The course progra…
☆47Updated 4 years ago
Alternatives and similar repositories for MIT-6.172-Performance-Engineering-of-Software-Systems
Users that are interested in MIT-6.172-Performance-Engineering-of-Software-Systems are comparing it to the libraries listed below
Sorting:
- Solution of Programming Massively Parallel Processors☆49Updated last year
- Personal Notes for Learning HPC & Parallel Computation [Active Adding New Content]☆76Updated 3 years ago
- Stanford CS149 -- Assignment 1☆142Updated 2 months ago
- MIT 6.172 Performance Engineering of Software Systems☆16Updated 4 years ago
- Learning materials for Stanford CS149 : Parallel Computing☆265Updated 4 years ago
- Learning material for CMU10-714: Deep Learning System☆293Updated last year
- A PyTorch-like deep learning framework. Just for fun.☆157Updated 2 years ago
- IMPACT GPU Algorithms Teaching Labs☆59Updated 2 years ago
- system paper reading notes☆246Updated 3 months ago
- Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.☆160Updated 3 years ago
- Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…☆76Updated 4 years ago
- 📚 A curated list of awesome matrix-matrix multiplication (A * B = C) frameworks, libraries and software☆59Updated 10 months ago
- Flash Attention from Scratch on CUDA Ampere☆115Updated 4 months ago
- 🌈 Solutions of LeetGPU☆62Updated last week
- ☆51Updated 4 months ago
- Principles and Methodologies for Serial Performance Optimization (OSDI' 25)☆21Updated 7 months ago
- deep learning framework from scratch☆32Updated 3 years ago
- Xiao's CUDA Optimization Guide [NO LONGER ADDING NEW CONTENT]☆322Updated 3 years ago
- ☆279Updated 2 months ago
- 晚上下班不刷手机,学点什么。系列一:CUDA 计算框架 CUFX (Cuda Framework eXtended)。☆16Updated last year
- 《自己动手写AI编译器》☆32Updated last year
- Codes & examples for "CUDA - From Correctness to Performance"☆120Updated last year
- ☆79Updated 3 years ago
- A scheduling framework for multitasking over diverse XPUs, including GPUs, NPUs, ASICs, and FPGAs☆150Updated last week
- ☆48Updated 2 years ago
- ☆67Updated last year
- Systems for GenAI☆151Updated 8 months ago
- Main Book repository for the Parallel and High Performance Computing book, Manning Publications☆221Updated 3 years ago
- ☆24Updated 2 years ago
- DGEMM on KNL, achieve 75% MKL☆19Updated 3 years ago