mit-han-lab/parallel-computing-tutorial

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/mit-han-lab/parallel-computing-tutorial)

mit-han-lab / parallel-computing-tutorial

☆178

Alternatives and similar repositories for parallel-computing-tutorial

Users that are interested in parallel-computing-tutorial are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

mit-han-lab / tinychat-tutorial
View on GitHub
☆79Nov 5, 2024Updated last year
mit-han-lab / TinyChatEngine
View on GitHub
TinyChatEngine: On-Device LLM Inference Library
☆958Jul 4, 2024Updated 2 years ago
PannenetsF / TQT
View on GitHub
TQT's pytorch implementation.
☆22Dec 17, 2021Updated 4 years ago
LeiWang1999 / AutoGPTQ.tvm
View on GitHub
GPTQ inference TVM kernel
☆41Apr 25, 2024Updated 2 years ago
KnowingNothing / MatmulTutorial
View on GitHub
A Easy-to-understand TensorOp Matmul Tutorial
☆445Mar 5, 2026Updated 4 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
mit-han-lab / Quest
View on GitHub
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
☆400Jul 10, 2025Updated last year
yifanlu0227 / MIT-6.5940
View on GitHub
All Homeworks for TinyML and Efficient Deep Learning Computing 6.5940 • Fall • 2023 • https://efficientml.ai
☆200Dec 2, 2023Updated 2 years ago
yifanlu0227 / LLaMA2-7B-on-laptop
View on GitHub
Lab 5 project of MIT-6.5940, deploying LLaMA2-7B-chat on one's laptop with TinyChatEngine.
☆18Dec 1, 2023Updated 2 years ago
mit-han-lab / tinyengine
View on GitHub
[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep L…
☆951Nov 27, 2024Updated last year
tlc-pack / libflash_attn
View on GitHub
Standalone Flash Attention v2 kernel without libtorch dependency
☆113Sep 10, 2024Updated last year
chhzh123 / ptc-tutorial
View on GitHub
PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo
☆17Mar 13, 2023Updated 3 years ago
mit-han-lab / smoothquant
View on GitHub
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
☆1,670Jul 12, 2024Updated 2 years ago
dcaox / MIT6.5940
View on GitHub
模型加速/模型压缩（已完成所有Lab）
☆11Dec 24, 2023Updated 2 years ago
megvii-research / IntLLaMA
View on GitHub
IntLLaMA: A fast and light quantization solution for LLaMA
☆19Jul 21, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
mit-han-lab / llm-awq
View on GitHub
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
☆3,592Jul 17, 2025Updated last year
LeiWang1999 / TVM.CMakeExtend
View on GitHub
Tutorials of Extending and importing TVM with CMAKE Include dependency.
☆16Oct 11, 2024Updated last year
ColfaxResearch / cutlass-kernels
View on GitHub
☆269Jul 11, 2024Updated 2 years ago
Seeker0472 / ysyx-linux
View on GitHub
给NEMU移植Linux Kernel!
☆23Jun 1, 2025Updated last year
meta-pytorch / float8_experimental
View on GitHub
This repository contains the experimental PyTorch native float8 training UX
☆226Aug 1, 2024Updated last year
hahnyuan / LLM-Viewer
View on GitHub
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline mod…
☆661Sep 11, 2024Updated last year
Guangxuan-Xiao / torch-int
View on GitHub
This repository contains integer operators on GPUs for PyTorch.
☆235Sep 29, 2023Updated 2 years ago
tpoisonooo / tengine-pipe
View on GitHub
Tengine 管子是用来快速生产 demo 的辅助工具
☆11Jul 15, 2021Updated 5 years ago
mlc-ai / tirx-kernels
View on GitHub
ML kernels and benchmarking infrastructure written in TIRx
☆66Updated this week
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
triton-lang / kernels
View on GitHub
☆115Mar 12, 2026Updated 4 months ago
Liu-xiandong / How_to_optimize_in_GPU
View on GitHub
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several…
☆1,329Jul 29, 2023Updated 2 years ago
xdit-project / DiTCacheAnalysis
View on GitHub
An auxiliary project analysis of the characteristics of KV in DiT Attention.
☆34Nov 29, 2024Updated last year
66RING / tiny-flash-attention
View on GitHub
flash attention tutorial written in python, triton, cuda, cutlass
☆527Jan 20, 2026Updated 6 months ago
KuangjuX / NVSHMEM-Tutorial
View on GitHub
NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer
☆195Feb 11, 2026Updated 5 months ago
uwsampl / SparseTIR
View on GitHub
SparseTIR: Sparse Tensor Compiler for Deep Learning
☆145Mar 31, 2023Updated 3 years ago
baichen318 / arch-explorer
View on GitHub
ArchExplorer: Microarchitecture Exploration Via Bottleneck Analysis
☆34Feb 20, 2024Updated 2 years ago
zhuzilin / pytorch-malloc
View on GitHub
An external memory allocator example for PyTorch.
☆16Aug 10, 2025Updated 11 months ago
bytedance / flux
View on GitHub
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
☆1,344Aug 28, 2025Updated 10 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
tpoisonooo / how-to-optimize-gemm
View on GitHub
row-major matmul optimization
☆743May 14, 2026Updated 2 months ago
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,494Updated this week
sjtu-epcc / DVABatch
View on GitHub
☆21May 13, 2022Updated 4 years ago
microsoft / TileFusion
View on GitHub
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
☆115Jun 28, 2025Updated last year
howardlau1999 / yatcpu
View on GitHub
Yet another toy CPU.
☆92Dec 10, 2023Updated 2 years ago
parsa-epfl / flexus
View on GitHub
Contains the code for the Flexus cycle-accurate simulator, used in QFlex.
☆14Updated this week
jundaf2 / INT8-Flash-Attention-FMHA-Quantization
View on GitHub
☆165Sep 15, 2023Updated 2 years ago