adam-maj / tiny-gpuLinks
A minimal GPU design in Verilog to learn how GPUs work from the ground up
☆8,591Updated 11 months ago
Alternatives and similar repositories for tiny-gpu
Users that are interested in tiny-gpu are comparing it to the libraries listed below
Sorting:
- LLM training in simple, raw C/CUDA☆27,176Updated 3 weeks ago
- A lightweight library for portable low-level GPU computation using WebGPU.☆3,879Updated this week
- Solve puzzles. Learn CUDA.☆11,279Updated 10 months ago
- OpenSource GPU, in Verilog, loosely based on RISC-V ISA☆1,045Updated 8 months ago
- Inference Llama 2 in one file of pure C☆18,566Updated 11 months ago
- Implementation for MatMul-free LM.☆3,016Updated this week
- Machine Learning Engineering Open Book☆14,454Updated this week
- Tile primitives for speedy kernels☆2,523Updated last week
- The n-gram Language Model☆1,437Updated 11 months ago
- Material for gpu-mode lectures☆4,752Updated last month
- lightweight, standalone C++ inference engine for Google's Gemma models.☆6,504Updated last week
- CoreNet: A library for training deep neural networks☆7,012Updated 2 months ago
- ☆1,268Updated 9 months ago
- 3D Visualization of an GPT-style LLM☆4,791Updated 10 months ago
- Video+code lecture on building nanoGPT from scratch☆4,228Updated 11 months ago
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.☆6,021Updated 3 months ago
- High-speed Large Language Model Serving for Local Deployment☆8,236Updated 5 months ago
- A very simple and easy to understand RISC-V core.☆1,272Updated last year
- ☆1,587Updated this week
- Puzzles for learning Triton☆1,769Updated 8 months ago
- If tinygrad wasn't small enough for you...☆724Updated last year
- ☆3,319Updated 10 months ago
- Tensor library for machine learning☆12,831Updated last week
- llama3 implementation one matrix multiplication at a time☆15,050Updated last year
- A nanoGPT pipeline packed in a spreadsheet☆2,118Updated last year
- CUDA Templates for Linear Algebra Subroutines☆8,078Updated this week
- Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.☆9,761Updated last year
- From the Tensor to Stable Diffusion, a rough outline for a 1 week course.☆1,068Updated this week
- NanoGPT (124M) in 3 minutes☆2,851Updated this week
- 📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉☆5,542Updated last week