gpu-mode / lecture2Links

Obsolete version of CUDA-mode repo -- use cuda-mode/lectures instead

☆26

Alternatives and similar repositories for lecture2

Users that are interested in lecture2 are comparing it to the libraries listed below

Sorting:

gpu-mode / profiling-cuda-in-torch
☆177Updated last year
ulrichstern / cuda-convnet
Alex Krizhevsky's original code from Google Code
☆198Updated 9 years ago
huggingface / optimum-benchmark
🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…
☆320Updated 2 months ago
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆267Updated last year
andrewkchan / yalm
Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O
☆533Updated 2 months ago
MekkCyber / TritonAcademy
A repository to unravel the language of GPUs, making their kernel conversations easy to understand
☆196Updated 5 months ago
rkinas / triton-resources
A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.
☆437Updated 8 months ago
HenryNdubuaku / cuda-tutorials
CUDA tutorials for Maths & ML tutorials with examples, covers multi-gpus, fused attention, winograd convolution, reinforcement learning.
☆200Updated 5 months ago
wolfecameron / nanoMoE
An extension of the nanoGPT repository for training small MOE models.
☆215Updated 8 months ago
hkproj / pytorch-transformer-distributed
Distributed training (multi-node) of a Transformer model
☆87Updated last year
hkproj / triton-flash-attention
☆219Updated 10 months ago
anyscale / e2e-llm-workflows
Fine-tune an LLM to perform batch inference and online serving.
☆113Updated 6 months ago
huggingface / picotron_tutorial
☆225Updated last month
linjames0 / Transformer-CUDA
An implementation of the transformer architecture onto an Nvidia CUDA kernel
☆195Updated 2 years ago
kyegomez / Python-Package-Template
A easy, reliable, fluid template for python packages complete with docs, testing suites, readme's, github workflows, linting and much muc…
☆195Updated last month
EmbeddedLLM / vllm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
☆93Updated this week
suneeta-mall / deep_learning_at_scale
Contains hands-on example code for [O'reilly book "Deep Learning At Scale"](https://www.oreilly.com/library/view/deep-learning-at/9781098…
☆29Updated last year
hkproj / quantization-notes
Notes on quantization in neural networks
☆109Updated last year
mlops-discord / gpu-optimization-workshop
Slides, notes, and materials for the workshop
☆334Updated last year
changjonathanc / flex-nano-vllm
FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.
☆305Updated 3 weeks ago
facebookresearch / LayerSkip
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
☆347Updated 6 months ago
1y33 / 100Days
GPU Kernels
☆209Updated 7 months ago
tcapelle / llm_recipes
A set of scripts and notebooks on LLM finetunning and dataset creation
☆111Updated last year
NVIDIA / Star-Attention
Efficient LLM Inference over Long Sequences
☆391Updated 5 months ago
cfregly / ai-performance-engineering
☆584Updated this week
triton-inference-server / triton_cli
Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferen…
☆72Updated 2 weeks ago
MekkCyber / CutlassAcademy
A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS
☆244Updated 6 months ago
evintunador / triton_docs_tutorials
making the official triton tutorials actually comprehensible
☆73Updated 3 months ago
gpu-mode / triton-index
Cataloging released Triton kernels.
☆272Updated 2 months ago
aryagxr / cuda
coding CUDA everyday!
☆71Updated last week