☆3,786Mar 11, 2026Updated 3 months ago
Alternatives and similar repositories for cuda-course
Users that are interested in cuda-course are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆496Dec 18, 2025Updated 6 months ago
- Material for gpu-mode lectures☆6,262Jun 15, 2026Updated 2 weeks ago
- Code from the "CUDA Crash Course" YouTube series by CoffeeBeforeArch☆963Jul 19, 2023Updated 2 years ago
- LLM training in simple, raw C/CUDA☆30,362Jun 26, 2025Updated last year
- Fast CUDA matrix multiplication from scratch☆1,222Sep 2, 2025Updated 9 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- 100 days of building GPU kernels!☆607Apr 27, 2025Updated last year
- 📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉☆11,324Updated this week
- GPU programming related news and material links☆2,206Jun 15, 2026Updated 2 weeks ago
- This repository is a curated collection of resources, tutorials, and practical examples designed to guide you through the journey of mast…☆454Feb 22, 2025Updated last year
- Samples for CUDA Developers which demonstrates features in CUDA Toolkit☆9,340May 27, 2026Updated last month
- Learnings and programs related to CUDA☆438Jun 29, 2025Updated last year
- CUDA Learning guide☆561Jun 20, 2024Updated 2 years ago
- Solve puzzles. Learn CUDA.☆12,258Sep 1, 2024Updated last year
- ☆429Apr 10, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A 120-day CUDA learning plan covering daily concepts, exercises, pitfalls, and references (including “Programming Massively Parallel Proc…☆931Mar 29, 2025Updated last year
- NVIDIA curated collection of educational resources related to general purpose GPU programming.☆1,787Updated this week
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆491Mar 10, 2025Updated last year
- Puzzles for learning Triton☆2,499Apr 1, 2026Updated 3 months ago
- LLM101n: Let's build a Storyteller☆37,383Aug 1, 2024Updated last year
- Machine Learning Engineering Open Book☆18,167May 18, 2026Updated last month
- Development repository for the Triton language and compiler☆19,525Updated this week
- GPU Kernels☆226Apr 27, 2025Updated last year
- Implement a ChatGPT-like LLM in PyTorch from scratch, step by step☆97,621Jun 2, 2026Updated 3 weeks ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- FlashInfer: Kernel Library for LLM Serving☆5,867Updated this week
- Some CUDA example code with READMEs.☆180Jun 9, 2026Updated 3 weeks ago
- Complete solutions to the Programming Massively Parallel Processors Edition 4☆796Jun 18, 2025Updated last year
- Tile primitives for speedy kernels☆3,497Jun 15, 2026Updated 2 weeks ago
- CUDA Library Samples☆2,446Jun 10, 2026Updated 3 weeks ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆165Oct 19, 2023Updated 2 years ago
- Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)☆1,010Aug 19, 2024Updated last year
- Notes and code for Programming Massively Parallel Processors☆13Mar 29, 2025Updated last year
- Simple problems implemented in CUDA C☆39Apr 7, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- The Structure and Interpretation of Tensor Programs: The Hacker's Accelerated Introduction to Deep Learning and Deep Learning Systems☆80Updated this week
- how to optimize some algorithm in cuda.☆3,102Updated this week
- Minimalistic 4D-parallelism distributed training framework for education purpose☆2,228Aug 26, 2025Updated 10 months ago
- A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API☆16,483Aug 8, 2024Updated last year
- Efficient Triton Kernels for LLM Training☆6,456Jun 23, 2026Updated last week
- Fast and memory-efficient exact attention☆24,221Jun 22, 2026Updated last week
- Official code repo for the O'Reilly Book - "Hands-On Large Language Models"☆27,269Apr 24, 2026Updated 2 months ago