This repository documents my 100-day journey of learning and writing CUDA kernels.
☆32Mar 29, 2026Updated 3 months ago
Alternatives and similar repositories for 100-days-cuda
Users that are interested in 100-days-cuda are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆11Aug 4, 2025Updated 10 months ago
- A series of high-performance GEMM (General Matrix Multiply) implementations Iteratively optimised for H100 GPUs in Pure CUDA.☆79Feb 18, 2026Updated 4 months ago
- learning & making kernels in cuda / triton☆22Aug 24, 2025Updated 10 months ago
- 100 days of building GPU kernels!☆607Apr 27, 2025Updated last year
- A repository with data about APTs☆13Nov 24, 2022Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- This project is a versatile and powerful search tool that leverages state-of-the-art natural language processing models to provide releva…☆12Apr 3, 2023Updated 3 years ago
- ☆10Aug 27, 2022Updated 3 years ago
- Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.☆19Feb 9, 2026Updated 4 months ago
- Scripts and outputs for ATLAS data in STIX JSON and ATT&CK Navigator layer formats☆32Updated this week
- LLM query engine to retrieve augmented responses from json files.☆15Oct 12, 2023Updated 2 years ago
- The Tifinagh Hand-written Letters Dataset☆12Feb 17, 2024Updated 2 years ago
- ☆24May 26, 2026Updated last month
- Using FlexAttention to compute attention with different masking patterns☆47Sep 22, 2024Updated last year
- This repository is a curated collection of resources, tutorials, and practical examples designed to guide you through the journey of mast…☆454Feb 22, 2025Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Training framework for Large Behavioral Models☆28Sep 17, 2025Updated 9 months ago
- A straightforward method to reduce your LLM inference API costs and token usage.☆24May 18, 2025Updated last year
- Fork of rust concurrent hash map bencmarks to include leapfrog map.☆14Mar 13, 2022Updated 4 years ago
- A collection of GPU experiments and benchmarks for my personal understanding and research.☆31Jun 15, 2026Updated 2 weeks ago
- A Beginner's Guide to Monetizing Your Python AI Chatbot☆17Apr 22, 2025Updated last year
- Load and run Llama from safetensors files in C☆15Oct 24, 2024Updated last year
- RyuseiLight is a beautiful, lightweight and extensible syntax highlighter.☆15Aug 9, 2021Updated 4 years ago
- coding CUDA everyday!☆76Feb 5, 2026Updated 4 months ago
- Machine Learning in Darija☆24Jul 10, 2020Updated 5 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- The Vulkan GPU radix sort implementation from Google Fuchsia, but with CMake☆15Jan 13, 2023Updated 3 years ago
- Composition of Multimodal Language Models From Scratch☆15Aug 16, 2024Updated last year
- Header-only skip list library for modern C++ (C++17/C++20)☆18Feb 1, 2022Updated 4 years ago
- Contains my solutions for various online judge problems, organized in the worst possible way☆14Jul 25, 2015Updated 10 years ago
- ☆13Oct 9, 2024Updated last year
- A STL-like graph library☆13Sep 5, 2025Updated 9 months ago
- High Performance FP8 GEMM Kernels for SM89 and later GPUs.☆21Jan 24, 2025Updated last year
- Install `wasm-bindgen` by downloading the executable☆12Mar 3, 2023Updated 3 years ago
- Comprehensive option pricing and strategy analyzer☆14Mar 29, 2025Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Generate PDF/PNG slides from source code☆12Oct 29, 2024Updated last year
- From a+b to sparsemax(QK^T)V in Triton!☆34Jun 19, 2025Updated last year
- ☆10Dec 23, 2023Updated 2 years ago
- ☆160Updated this week
- Tutorial for (PyTorch) + (C++) + (Metal shader)☆16Oct 25, 2025Updated 8 months ago
- A comprehensive hands-on project for learning GPU programming with CUDA and HIP, covering fundamental concepts through advanced optimizat…☆37Nov 20, 2025Updated 7 months ago
- A curation of awesome portfolio website ideas for developers and designers to draw inspiration from. Raise a pull request to add more. 💜…☆17Apr 15, 2025Updated last year