adam-maj / tiny-gpuLinks
A minimal GPU design in Verilog to learn how GPUs work from the ground up
☆8,993Updated last year
Alternatives and similar repositories for tiny-gpu
Users that are interested in tiny-gpu are comparing it to the libraries listed below
Sorting:
- LLM training in simple, raw C/CUDA☆28,414Updated 5 months ago
- Open-source high-performance RISC-V processor☆6,802Updated this week
- OpenSource GPU, in Verilog, loosely based on RISC-V ISA☆1,146Updated last year
- Inference Llama 2 in one file of pure C☆19,032Updated last year
- ☆1,279Updated last year
- Solve puzzles. Learn CUDA.☆11,834Updated last year
- A lightweight library for portable low-level GPU computation using WebGPU.☆3,923Updated 2 months ago
- lightweight, standalone C++ inference engine for Google's Gemma models.☆6,643Updated this week
- Tile primitives for speedy kernels☆3,008Updated 2 weeks ago
- CoreNet: A library for training deep neural networks☆7,023Updated 2 months ago
- Material for gpu-mode lectures☆5,432Updated 2 weeks ago
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.☆6,169Updated 4 months ago
- If tinygrad wasn't small enough for you...☆759Updated last year
- Implementation for MatMul-free LM.☆3,042Updated 3 weeks ago
- A PyTorch native platform for training generative AI models☆4,866Updated this week
- Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.☆10,217Updated last year
- Tensor library for machine learning☆13,743Updated last week
- Blazingly fast LLM inference.☆6,296Updated this week
- A minimal tensor processing unit (TPU), inspired by Google's TPU V2 and V1☆1,066Updated 4 months ago
- Envision a future where every student can read all the code of a teaching operating system.☆2,381Updated last month
- CUDA Templates and Python DSLs for High-Performance Linear Algebra☆8,991Updated this week
- NanoGPT (124M) in 3 minutes☆3,974Updated this week
- Run PyTorch LLMs locally on servers, desktop and mobile☆3,623Updated 3 months ago
- Solve puzzles. Improve your pytorch.☆3,851Updated last year
- ☆1,818Updated last week
- GNU toolchain for RISC-V, including GCC☆4,280Updated last week
- llama3 implementation one matrix multiplication at a time☆15,203Updated last year
- A computer science textbook☆4,545Updated last year
- Distributed LLM and StableDiffusion inference for mobile, desktop and server.☆2,900Updated last year
- RISC-V XV6/Linux SoC, marchID: 0x2b☆1,003Updated 3 weeks ago