gpu-mode / lecture2Links
Obsolete version of CUDA-mode repo -- use cuda-mode/lectures instead
☆25Updated last year
Alternatives and similar repositories for lecture2
Users that are interested in lecture2 are comparing it to the libraries listed below
Sorting:
- ☆159Updated last year
- Fine-tune an LLM to perform batch inference and online serving.☆112Updated 3 weeks ago
- Machine Learning Serving focused on GenAI with simplicity as the top priority.☆59Updated 2 months ago
- Distributed training (multi-node) of a Transformer model☆71Updated last year
- ☆174Updated 5 months ago
- experiments with inference on llama☆104Updated last year
- ☆213Updated 5 months ago
- Alex Krizhevsky's original code from Google Code☆192Updated 9 years ago
- ☆159Updated this week
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆87Updated this week
- ☆41Updated last month
- A set of scripts and notebooks on LLM finetunning and dataset creation☆111Updated 8 months ago
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆137Updated 10 months ago
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆185Updated last year
- Efficient LLM Inference over Long Sequences☆378Updated 2 weeks ago
- Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O☆377Updated 2 weeks ago
- LoRA and DoRA from Scratch Implementations☆204Updated last year
- LLM training in simple, raw C/CUDA☆99Updated last year
- ☆193Updated 4 months ago
- Cataloging released Triton kernels.☆238Updated 5 months ago
- An extension of the nanoGPT repository for training small MOE models.☆152Updated 3 months ago
- Data preparation code for Amber 7B LLM☆91Updated last year
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)☆119Updated this week
- Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferen…☆64Updated 2 weeks ago
- Fast low-bit matmul kernels in Triton☆322Updated last week
- Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and L…☆16Updated last year
- ☆219Updated this week
- ML/DL Math and Method notes☆61Updated last year
- Load compute kernels from the Hub☆191Updated this week
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆304Updated 3 weeks ago