☆3,490Mar 11, 2026Updated last month
Alternatives and similar repositories for cuda-course
Users that are interested in cuda-course are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆466Dec 18, 2025Updated 3 months ago
- Material for gpu-mode lectures☆5,923Feb 1, 2026Updated 2 months ago
- Code from the "CUDA Crash Course" YouTube series by CoffeeBeforeArch☆942Jul 19, 2023Updated 2 years ago
- LLM training in simple, raw C/CUDA☆29,359Jun 26, 2025Updated 9 months ago
- Fast CUDA matrix multiplication from scratch☆1,119Sep 2, 2025Updated 7 months ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- 100 days of building GPU kernels!☆587Apr 27, 2025Updated 11 months ago
- This repository is a curated collection of resources, tutorials, and practical examples designed to guide you through the journey of mast…☆447Feb 22, 2025Updated last year
- GPU programming related news and material links☆2,084Mar 8, 2026Updated last month
- 📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉☆10,217Updated this week
- Samples for CUDA Developers which demonstrates features in CUDA Toolkit☆9,050Mar 30, 2026Updated last week
- Learnings and programs related to CUDA☆437Jun 29, 2025Updated 9 months ago
- CUDA Learning guide☆548Jun 20, 2024Updated last year
- Solve puzzles. Learn CUDA.☆12,027Sep 1, 2024Updated last year
- NVIDIA curated collection of educational resources related to general purpose GPU programming.☆1,446Mar 30, 2026Updated last week
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆425Apr 10, 2025Updated last year
- A 120-day CUDA learning plan covering daily concepts, exercises, pitfalls, and references (including “Programming Massively Parallel Proc…☆900Mar 29, 2025Updated last year
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆470Mar 10, 2025Updated last year
- Puzzles for learning Triton☆2,359Apr 1, 2026Updated last week
- LLM101n: Let's build a Storyteller☆36,651Aug 1, 2024Updated last year
- Machine Learning Engineering Open Book☆17,642Mar 16, 2026Updated 3 weeks ago
- Development repository for the Triton language and compiler☆18,840Apr 4, 2026Updated last week
- GPU Kernels☆223Apr 27, 2025Updated 11 months ago
- Complete solutions to the Programming Massively Parallel Processors Edition 4☆706Jun 18, 2025Updated 9 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Some CUDA example code with READMEs.☆181Nov 11, 2025Updated 4 months ago
- FlashInfer: Kernel Library for LLM Serving☆5,273Apr 4, 2026Updated last week
- Implement a ChatGPT-like LLM in PyTorch from scratch, step by step☆90,284Updated this week
- Tile primitives for speedy kernels☆3,304Mar 28, 2026Updated last week
- CUDA Library Samples☆2,372Mar 17, 2026Updated 3 weeks ago
- Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)☆955Aug 19, 2024Updated last year
- Notes and code for Programming Massively Parallel Processors☆13Mar 29, 2025Updated last year
- Simple problems implemented in CUDA C☆35Apr 7, 2025Updated last year
- teaching software 2.0 to programmers of software 1.0☆63Apr 3, 2026Updated last week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- how to optimize some algorithm in cuda.☆2,910Apr 1, 2026Updated last week
- Minimalistic 4D-parallelism distributed training framework for education purpose☆2,137Aug 26, 2025Updated 7 months ago
- Efficient Triton Kernels for LLM Training☆6,265Updated this week
- Fast and memory-efficient exact attention☆23,185Updated this week
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆164Oct 19, 2023Updated 2 years ago
- Official code repo for the O'Reilly Book - "Hands-On Large Language Models"☆24,849Dec 17, 2025Updated 3 months ago
- A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API☆15,337Aug 8, 2024Updated last year