This repository documents my 100-day journey of learning and writing CUDA kernels.
☆25Jun 25, 2025Updated 8 months ago
Alternatives and similar repositories for 100-days-cuda
Users that are interested in 100-days-cuda are comparing it to the libraries listed below
Sorting:
- ☆10Dec 23, 2023Updated 2 years ago
- This repository contains an analysis of the effects of COVID-19 on trade trends up to December 2021. The dataset used provides daily trad…☆15Aug 16, 2023Updated 2 years ago
- Persistent dense gemm for Hopper in `CuTeDSL`☆15Aug 9, 2025Updated 7 months ago
- ☆23Jul 11, 2025Updated 7 months ago
- The Django Blog Platform is a comprehensive web application designed for blogging purposes, built with Django framework. It empowers user…☆11Feb 24, 2024Updated 2 years ago
- ☆16Feb 24, 2026Updated 2 weeks ago
- Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.☆17Feb 9, 2026Updated last month
- Tutorial for (PyTorch) + (C++) + (Metal shader)☆16Oct 25, 2025Updated 4 months ago
- ☆11Mar 30, 2025Updated 11 months ago
- Convert an image sequence to a PLY point cloud.☆15Oct 14, 2017Updated 8 years ago
- Header-only skip list library for modern C++ (C++17/C++20)☆19Feb 1, 2022Updated 4 years ago
- My leetcode solutions☆11Jan 11, 2023Updated 3 years ago
- 100 days of building GPU kernels!☆575Apr 27, 2025Updated 10 months ago
- ☆105Feb 25, 2026Updated last week
- Load and run Llama from safetensors files in C☆15Oct 24, 2024Updated last year
- 清华大学计算机系《数据库系统概论》2022 年大作业项目 DBMS,支持基础 SQL 的解析和执行。☆12Jan 12, 2023Updated 3 years ago
- Personal solutions to the Triton Puzzles☆20Jul 18, 2024Updated last year
- Vortex: A Flexible and Efficient Sparse Attention Framework☆48Jan 21, 2026Updated last month
- High Performance FP8 GEMM Kernels for SM89 and later GPUs.☆20Jan 24, 2025Updated last year
- Measuring Thinking Efficiency in Reasoning Models - Research Repository☆39Dec 2, 2025Updated 3 months ago
- Ship correct and fast LLM kernels to PyTorch☆144Jan 14, 2026Updated last month
- ☆18Jun 29, 2021Updated 4 years ago
- A series of high-performance GEMM (General Matrix Multiply) implementations Iteratively optimised for H100 GPUs in Pure CUDA.☆71Feb 18, 2026Updated 2 weeks ago
- MFCC implementation with detailed comments.☆17Nov 26, 2020Updated 5 years ago
- Pure Triton kernels for Qwen3.5-27B inference on NVIDIA B200☆66Feb 28, 2026Updated last week
- 清华大学《计算机组成原理》大实验——五级流水线 RISC-V 处理器。「奋战三星期,造台计算机」☆22Mar 11, 2023Updated 2 years ago
- Computer Assisted Police Sketching Using Generative Adversarial Networks (PYTHON-3)☆20Jun 28, 2019Updated 6 years ago
- learning & making kernels in cuda / triton☆22Aug 24, 2025Updated 6 months ago
- General Matrix Multiplication using NVIDIA Tensor Cores☆28Jan 25, 2025Updated last year
- 《Reinforce Learning: An Introduction》第二版中文笔记☆21Apr 10, 2019Updated 6 years ago
- A high-performance attention mechanism that computes softmax normalization in a single streaming pass using running accumulators (online …☆29Oct 11, 2025Updated 4 months ago
- >>> 异常中断 + 虚存页表 + 分支预测 + TLB + Cache + Flash + VGA + uCore☆20Nov 17, 2023Updated 2 years ago
- ☆19Mar 3, 2025Updated last year
- FROM $f(x)$ AND $g(x)$ TO $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones☆64Jan 26, 2026Updated last month
- Sample Codes using NVSHMEM on Multi-GPU☆30Jan 22, 2023Updated 3 years ago
- ☆31Jul 16, 2025Updated 7 months ago
- Codebase for Cuda Learning☆31Jul 13, 2024Updated last year
- ☆31Jun 22, 2025Updated 8 months ago
- This repository is a curated collection of resources, tutorials, and practical examples designed to guide you through the journey of mast…☆441Feb 22, 2025Updated last year