dlsyscourse / hw1Links
☆8Updated 10 months ago
Alternatives and similar repositories for hw1
Users that are interested in hw1 are comparing it to the libraries listed below
Sorting:
- ☆38Updated last year
- A simple calculation for LLM MFU.☆39Updated 4 months ago
- llm theoretical performance analysis tools and support params, flops, memory and latency analysis.☆98Updated this week
- Machine Learning Compiler Road Map☆43Updated last year
- Code base and slides for ECE408:Applied Parallel Programming On GPU.☆127Updated 4 years ago
- ☆206Updated 7 months ago
- A practical way of learning Swizzle☆21Updated 5 months ago
- GPTQ inference TVM kernel☆40Updated last year
- A TVM-like CUDA/C code generator.☆9Updated 3 years ago
- ☆90Updated 3 months ago
- ☆84Updated 2 months ago
- Tutorials for writing high-performance GPU operators in AI frameworks.☆129Updated last year
- A high-performance distributed deep learning system targeting large-scale and automated distributed training. If you have any interests, …☆115Updated last year
- Simple PyTorch graph capturing.☆20Updated 2 years ago
- Implement Flash Attention using Cute.☆88Updated 7 months ago
- ☆96Updated 10 months ago
- Systems for GenAI☆142Updated 2 months ago
- ☆40Updated 4 years ago
- Stateful LLM Serving☆76Updated 4 months ago
- A lightweight design for computation-communication overlap.☆146Updated 3 weeks ago
- nnScaler: Compiling DNN models for Parallel Training☆113Updated last week
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆93Updated this week
- A baseline repository of Auto-Parallelism in Training Neural Networks☆144Updated 3 years ago
- A minimal implementation of vllm.☆49Updated 11 months ago
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆108Updated last year
- [USENIX ATC '24] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Paral…☆58Updated 11 months ago
- ☆172Updated last year
- gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling☆35Updated this week
- Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…☆71Updated 4 years ago
- DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.☆52Updated 10 months ago