adam-maj / tiny-gpuLinks
A minimal GPU design in Verilog to learn how GPUs work from the ground up
☆8,641Updated 11 months ago
Alternatives and similar repositories for tiny-gpu
Users that are interested in tiny-gpu are comparing it to the libraries listed below
Sorting:
- A lightweight library for portable low-level GPU computation using WebGPU.☆3,899Updated last month
- LLM training in simple, raw C/CUDA☆27,349Updated last month
- Implementation for MatMul-free LM.☆3,029Updated 3 weeks ago
- OpenSource GPU, in Verilog, loosely based on RISC-V ISA☆1,059Updated 8 months ago
- lightweight, standalone C++ inference engine for Google's Gemma models.☆6,529Updated this week
- Solve puzzles. Learn CUDA.☆11,355Updated 11 months ago
- Material for gpu-mode lectures☆4,842Updated last month
- ☆1,270Updated 10 months ago
- Envision a future where every student can read all the code of a teaching operating system.☆2,352Updated last week
- Open-source high-performance RISC-V processor☆6,544Updated this week
- Tile primitives for speedy kernels☆2,570Updated last week
- Inference Llama 2 in one file of pure C☆18,626Updated last year
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.☆6,044Updated 4 months ago
- The official PyTorch implementation of Google's Gemma models☆5,529Updated 2 months ago
- High-speed Large Language Model Serving for Local Deployment☆8,304Updated 2 weeks ago
- A tiny C header-only risc-v emulator.☆1,962Updated 3 months ago
- RISC-V Linux SoC, marchID: 0x2b☆936Updated 2 weeks ago
- Large World Model -- Modeling Text and Video with Millions Context☆7,324Updated 9 months ago
- If tinygrad wasn't small enough for you...☆728Updated last year
- GPU programming related news and material links☆1,652Updated 7 months ago
- Learning FPGA, yosys, nextpnr, and RISC-V☆2,875Updated 5 months ago
- Puzzles for learning Triton☆1,832Updated 8 months ago
- NanoGPT (124M) in 3 minutes☆3,025Updated 3 weeks ago
- Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.☆9,836Updated last year
- Deep learning at the speed of light.☆2,088Updated this week
- llama3 implementation one matrix multiplication at a time☆15,097Updated last year
- ☆4,086Updated last year
- Writing an OS in 1,000 lines.☆2,719Updated last week
- My favorite C programming practices.☆2,114Updated 4 years ago
- CoreNet: A library for training deep neural networks☆7,017Updated 3 months ago