gpu-mode / lecture2Links
Obsolete version of CUDA-mode repo -- use cuda-mode/lectures instead
β26Updated last year
Alternatives and similar repositories for lecture2
Users that are interested in lecture2 are comparing it to the libraries listed below
Sorting:
- ποΈ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Oβ¦β323Updated 2 months ago
- β178Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMsβ267Updated 2 weeks ago
- β225Updated 3 weeks ago
- Fine-tune an LLM to perform batch inference and online serving.β115Updated 6 months ago
- Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/Oβ539Updated 3 months ago
- Alex Krizhevsky's original code from Google Codeβ197Updated 9 years ago
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.β441Updated 9 months ago
- Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)β161Updated 3 weeks ago
- β227Updated 11 months ago
- An extension of the nanoGPT repository for training small MOE models.β218Updated 9 months ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understandβ195Updated 6 months ago
- Notes on "Programming Massively Parallel Processors" by Hwu, Kirk, and Hajj (4th ed.)β53Updated last year
- A set of scripts and notebooks on LLM finetunning and dataset creationβ112Updated last year
- Simple MPI implementation for prototyping or learningβ292Updated 4 months ago
- πΉοΈ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.β139Updated last year
- CUDA tutorials for Maths & ML tutorials with examples, covers multi-gpus, fused attention, winograd convolution, reinforcement learning.β205Updated 6 months ago
- Distributed training (multi-node) of a Transformer modelβ90Updated last year
- Slides, notes, and materials for the workshopβ336Updated last year
- coding CUDA everyday!β71Updated last week
- An implementation of the transformer architecture onto an Nvidia CUDA kernelβ196Updated 2 years ago
- β86Updated last month
- Cataloging released Triton kernels.β277Updated 3 months ago
- Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Lβ¦β17Updated last year
- Efficient LLM Inference over Long Sequencesβ393Updated 5 months ago
- LLaMA 2 implemented from scratch in PyTorchβ363Updated 2 years ago
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.β321Updated last month
- GPU Kernelsβ210Updated 7 months ago
- Notes on quantization in neural networksβ113Updated 2 years ago
- A comprehensive deep dive into the world of tokensβ227Updated last year