dlsyscourse / hw1Links
☆13Updated last month
Alternatives and similar repositories for hw1
Users that are interested in hw1 are comparing it to the libraries listed below
Sorting:
- llm theoretical performance analysis tools and support params, flops, memory and latency analysis.☆112Updated 4 months ago
- Code base and slides for ECE408:Applied Parallel Programming On GPU.☆137Updated 4 years ago
- ☆51Updated 2 months ago
- A high-performance distributed deep learning system targeting large-scale and automated distributed training. If you have any interests, …☆123Updated last year
- A simple calculation for LLM MFU.☆50Updated 2 months ago
- ☆41Updated last month
- ☆210Updated last year
- Systems for GenAI☆147Updated 7 months ago
- Machine Learning Compiler Road Map☆45Updated 2 years ago
- ☆102Updated last year
- A practical way of learning Swizzle☆33Updated 9 months ago
- A baseline repository of Auto-Parallelism in Training Neural Networks☆147Updated 3 years ago
- [USENIX ATC '24] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Paral…☆66Updated last year
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆143Updated 2 months ago
- Implement Flash Attention using Cute.☆96Updated 11 months ago
- Codes & examples for "CUDA - From Correctness to Performance"☆117Updated last year
- 注释的nano_vllm仓库,并且完成了MiniCPM4的适配以及注册新模 型的功能☆95Updated 3 months ago
- Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…☆279Updated 8 months ago
- A PyTorch-like deep learning framework. Just for fun.☆156Updated 2 years ago
- ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.☆127Updated 6 months ago
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity☆223Updated 2 years ago
- ☆273Updated 3 weeks ago
- ☆97Updated 7 months ago
- A minimal implementation of vllm.☆61Updated last year
- GPTQ inference TVM kernel☆39Updated last year
- Tutorials for writing high-performance GPU operators in AI frameworks.☆134Updated 2 years ago
- ☆176Updated 2 years ago
- A curated list of awesome projects and papers for distributed training or inference☆250Updated last year
- ☆81Updated 7 months ago
- Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…☆75Updated 4 years ago