cuda-mode / lectures
Material for cuda-mode lectures
☆2,401Updated 2 weeks ago
Related projects: ⓘ
- CUDA related news and material links☆1,079Updated 2 weeks ago
- Puzzles for learning Triton☆966Updated this week
- Tile primitives for speedy kernels☆1,489Updated this week
- 📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batc…☆2,475Updated this week
- An ML Systems Onboarding list☆491Updated last month
- 🎉CUDA/C++ 笔记 / 技术博客: fp32、fp16/bf16、fp8/int8、flash_attn、sgemm、sgemv、warp/block reduce、dot prod、elementwise、softmax、layernorm、rmsnorm、his…☆1,140Updated this week
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration☆2,333Updated 2 months ago
- Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton☆1,190Updated this week
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs…☆1,811Updated this week
- how to optimize some algorithm in cuda.☆1,443Updated this week
- Flash Attention in ~100 lines of CUDA (forward pass only)☆558Updated 5 months ago
- UNet diffusion model in pure CUDA☆562Updated 2 months ago
- Schedule-Free Optimization in PyTorch☆1,800Updated last month
- The full minitorch student suite.☆1,849Updated last month
- FlashInfer: Kernel Library for LLM Serving☆1,138Updated this week
- nanoGPT style version of Llama 3.1☆1,162Updated last month
- CUDA Templates for Linear Algebra Subroutines☆5,359Updated this week
- Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)☆541Updated last month
- PyTorch native quantization and sparsity for training and inference☆726Updated this week
- Solve puzzles. Improve your pytorch.☆3,056Updated 2 months ago
- Learn CUDA Programming, published by Packt☆987Updated 8 months ago
- A native PyTorch Library for large model training☆1,544Updated this week
- A curated list for Efficient Large Language Models☆1,113Updated this week
- Transformer related optimization, including BERT, GPT☆5,773Updated 5 months ago
- What would you do with 1000 H100s...☆816Updated 8 months ago
- Awesome LLM compression research papers and tools.☆1,054Updated this week
- Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors a…☆1,131Updated this week
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆1,180Updated 2 months ago
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.☆662Updated last month
- LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabili…☆2,292Updated this week