AkashKarnatak / 100-days-of-cudaLinks

Will write CUDA for 100 days

☆32

Alternatives and similar repositories for 100-days-of-cuda

Users that are interested in 100-days-of-cuda are comparing it to the libraries listed below

Sorting:

HenryNdubuaku / cuda-tutorials
CUDA tutorials for Maths & ML tutorials with examples, covers multi-gpus, fused attention, winograd convolution, reinforcement learning.
☆187Updated last month
golfxiao / ScratchLLMStepByStep
一个手把手教你从零开始编写GPT并训练大语言模型的教程
☆82Updated 5 months ago
hkproj / 100-days-of-gpu
☆350Updated 3 months ago
andrewkchan / yalm
Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O
☆388Updated last month
andrewkchan / deepseek.cpp
CPU inference for the DeepSeek family of large language models in C++
☆308Updated last month
gpu-mode / awesomeMLSys
An ML Systems Onboarding list
☆836Updated 5 months ago
rkinas / cuda-learning
This repository is a curated collection of resources, tutorials, and practical examples designed to guide you through the journey of mast…
☆360Updated 4 months ago
NiuTrans / compiler-notes
☆80Updated 2 years ago
KaihuaTang / Building-a-Small-LLM-from-Scratch
该系列的目的是让读者可以在基础的pytorch上，不依赖任何其他现成的外部库，从零开始理解并实现一个大语言模型的所有组成部分，以及训练微调代码，因此读者仅需python，pytorch和最基础深度学习背景知识即可。
☆351Updated last week
danbev / learning-ai
Notes and exploration code for learning about AI/ML
☆182Updated this week
lucasdelimanogueira / PyNorch
Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)
☆150Updated last year
muyuuuu / CMakeGuide
一个简短的、带示例的 CMake 入门示例
☆52Updated 8 months ago
CisMine / Parallel-Computing-Cuda-C
CUDA Learning guide
☆403Updated last year
FareedKhan-dev / train-deepseek-r1
Building DeepSeek R1 from Scratch
☆654Updated 3 months ago
apoorvnandan / tensor.h
creating a tiny tensor library in raw C
☆732Updated 4 months ago
h3ct0rjs / HighPerformanceComputing
Class of High Performance Computing taken at U.T.P 2017
☆69Updated 7 years ago
AIDajiangtang / LLM-from-scratch
从零开始学大模型Transformer、GPT2、BERT pre-training and fine-tuning from scratch
☆34Updated last year
salykova / sgemm.c
Multi-Threaded FP32 Matrix Multiplication on x86 CPUs
☆350Updated 2 months ago
Maharshi-Pandya / cudacodes
Learnings and programs related to CUDA
☆411Updated 2 weeks ago
AdepojuJeremy / CUDA-120-DAYS--CHALLENGE
A 120-day CUDA learning plan covering daily concepts, exercises, pitfalls, and references (including “Programming Massively Parallel Proc…
☆713Updated 3 months ago
rodmarkun / SmolML
A fully functional and simple Machine Learning library made entirely from scratch with Python.
☆295Updated last month
loganwatchorn / notes-pmpp
Notes on "Programming Massively Parallel Processors" by Hwu, Kirk, and Hajj (4th ed.)
☆53Updated 11 months ago
harleyszhang / llm_note
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
☆798Updated this week
ForceInjection / AI-fundermentals
AI 基础知识 - GPU 架构、CUDA 编程、大模型基础及AI Agent 相关知识
☆159Updated this week
apoorvnandan / lilgrad
pytorch from scratch in pure C/CUDA and python
☆40Updated 9 months ago
nvixnu / pmpp__programming_massively_parallel_processors
Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…
☆72Updated 4 years ago
Quentin-Anthony / nanoMPI
Simple MPI implementation for prototyping or learning
☆263Updated 3 weeks ago
rkinas / triton-resources
A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.
☆379Updated 4 months ago
mohitmishra786 / exploring-os
This repository is a journey through Operating System concepts, with practical implementations in C. Each day focuses on a specific topic…
☆286Updated this week
dvgodoy / FineTuningLLMs
Official repository of my book "A Hands-On Guide to Fine-Tuning LLMs with PyTorch and Hugging Face"
☆330Updated 3 months ago