adam-maj / tiny-gpu
A minimal GPU design in Verilog to learn how GPUs work from the ground up
☆7,847Updated 6 months ago
Alternatives and similar repositories for tiny-gpu:
Users that are interested in tiny-gpu are comparing it to the libraries listed below
- LLM training in simple, raw C/CUDA☆25,580Updated 4 months ago
- Implementation for MatMul-free LM.☆2,960Updated 3 months ago
- llama3 implementation one matrix multiplication at a time☆14,134Updated 8 months ago
- lightweight, standalone C++ inference engine for Google's Gemma models.☆6,106Updated this week
- Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.☆9,400Updated 7 months ago
- OpenSource GPU, in Verilog, loosely based on RISC-V ISA☆919Updated 2 months ago
- A lightweight library for portable low-level GPU computation using WebGPU.☆3,818Updated last week
- Inference Llama 2 in one file of pure C☆18,027Updated 6 months ago
- The official PyTorch implementation of Google's Gemma models☆5,351Updated last month
- MLX: An array framework for Apple silicon☆19,097Updated this week
- Video+code lecture on building nanoGPT from scratch☆3,869Updated 6 months ago
- Material for gpu-mode lectures☆3,701Updated last week
- NanoGPT (124M) in 3 minutes☆2,278Updated this week
- A deep-dive on the entire history of deep-learning☆1,160Updated 7 months ago
- A PyTorch native library for large model training☆3,300Updated this week
- Distribute and run LLMs with a single file.☆21,718Updated 2 weeks ago
- From the Tensor to Stable Diffusion, a rough outline for a 1 week course.☆1,047Updated last month
- Solve puzzles. Learn CUDA.☆10,499Updated 5 months ago
- Puzzles for learning Triton☆1,393Updated 3 months ago
- CoreNet: A library for training deep neural networks☆7,002Updated 4 months ago
- Tile primitives for speedy kernels☆2,032Updated this week
- The n-gram Language Model☆1,383Updated 6 months ago
- Development repository for the Triton language and compiler☆14,406Updated this week
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆39,377Updated 2 months ago
- Tensor library for machine learning☆11,857Updated this week
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.☆5,784Updated 2 months ago
- Examples in the MLX framework☆6,939Updated this week
- Run PyTorch LLMs locally on servers, desktop and mobile☆3,499Updated this week
- Blazingly fast LLM inference.☆5,022Updated this week
- Efficient Triton Kernels for LLM Training☆4,415Updated this week