This repository documents my 100-day journey of learning and writing CUDA kernels.
☆31Mar 29, 2026Updated last month
Alternatives and similar repositories for 100-days-cuda
Users that are interested in 100-days-cuda are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Challenging myself to learn CUDA (Basics → Intermediate) these 100 days.☆34Mar 2, 2026Updated 2 months ago
- A series of high-performance GEMM (General Matrix Multiply) implementations Iteratively optimised for H100 GPUs in Pure CUDA.☆77Feb 18, 2026Updated 3 months ago
- learning & making kernels in cuda / triton☆22Aug 24, 2025Updated 8 months ago
- ☆11Mar 30, 2025Updated last year
- 100 days of building GPU kernels!☆596Apr 27, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.☆19Feb 9, 2026Updated 3 months ago
- ☆24Apr 7, 2026Updated last month
- Using FlexAttention to compute attention with different masking patterns☆47Sep 22, 2024Updated last year
- Minimal TPU implementation with 8x8 systolic array and PyTorch integration☆61Jan 26, 2026Updated 3 months ago
- A collection of GPU experiments and benchmarks for my personal understanding and research.☆30Apr 9, 2026Updated last month
- A Beginner's Guide to Monetizing Your Python AI Chatbot☆16Apr 22, 2025Updated last year
- Load and run Llama from safetensors files in C☆15Oct 24, 2024Updated last year
- RyuseiLight is a beautiful, lightweight and extensible syntax highlighter.☆15Aug 9, 2021Updated 4 years ago
- coding CUDA everyday!☆77Feb 5, 2026Updated 3 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Composition of Multimodal Language Models From Scratch☆15Aug 16, 2024Updated last year
- Implementation of 12 AI agents evaluation techniques☆43Jul 31, 2025Updated 9 months ago
- ☆20Apr 24, 2026Updated 3 weeks ago
- ☆13Oct 9, 2024Updated last year
- High Performance FP8 GEMM Kernels for SM89 and later GPUs.☆21Jan 24, 2025Updated last year
- Install `wasm-bindgen` by downloading the executable☆12Mar 3, 2023Updated 3 years ago
- EB1A DIY Collection☆16Nov 17, 2025Updated 6 months ago
- ☆149Apr 4, 2026Updated last month
- From a+b to sparsemax(QK^T)V in Triton!☆34Jun 19, 2025Updated 11 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Generate PDF/PNG slides from source code☆12Oct 29, 2024Updated last year
- Tutorial for (PyTorch) + (C++) + (Metal shader)☆16Oct 25, 2025Updated 6 months ago
- 專門為廢土伺服器所製作的存綠寶石Bot☆16Mar 20, 2024Updated 2 years ago
- A comprehensive hands-on project for learning GPU programming with CUDA and HIP, covering fundamental concepts through advanced optimizat…☆35Nov 20, 2025Updated 5 months ago
- Apply GPU in ML and DL☆69Mar 23, 2026Updated last month
- VNHSGE: Vietnamese High School Graduation Examination Dataset for Large Language Models☆29Jul 24, 2023Updated 2 years ago
- Synthetic data generation for evaluating LLM symbolic and logic reasoning☆22Mar 6, 2026Updated 2 months ago
- A Transformer Model Exploiting Histology Images and Spatial Gene Expression☆22Mar 18, 2025Updated last year
- implement GPT-OSS 20B & 120B C++ inference from scratch on AMD GPUs☆173Oct 25, 2025Updated 6 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Personal solutions to the Triton Puzzles☆21Jul 18, 2024Updated last year
- A command-line tool for convert SVG image to PDF file☆17Mar 29, 2025Updated last year
- ☆19Mar 3, 2025Updated last year
- General Matrix Multiplication using NVIDIA Tensor Cores☆28Jan 25, 2025Updated last year
- Learn RL Techniques in 3 Easy Projects☆20Oct 16, 2024Updated last year
- Houdini Python Wiki☆18Mar 18, 2024Updated 2 years ago
- Collective and Neighbor Collective Optimizations and Extensions☆13Mar 26, 2026Updated last month