geohot / gpunoobLinks
Noob Lessons from Stream about how GPUs work
☆130Updated 6 months ago
Alternatives and similar repositories for gpunoob
Users that are interested in gpunoob are comparing it to the libraries listed below
Sorting:
- parallelized hyperdimensional tictactoe☆125Updated last year
- ☆94Updated last week
- Can RL solve simple problems?☆54Updated last year
- Solve puzzles to improve your tinygrad skills!☆145Updated 2 weeks ago
- An implementation of delta-iris in tinygrad☆72Updated last year
- could we make an ml stack in 100,000 lines of code?☆46Updated last year
- Tutorials on tinygrad☆431Updated 2 weeks ago
- small auto-grad engine inspired from Karpathy's micrograd and PyTorch☆276Updated 11 months ago
- An implement of deep learning framework and models in C☆48Updated 6 months ago
- Tensor library with autograd using only Rust's standard library☆70Updated last year
- a tiny multidimensional array implementation in C similar to numpy, but only one file.☆228Updated last year
- The Tensor (or Array)☆451Updated last year
- pytorch from scratch in pure C/CUDA and python☆41Updated last year
- The simplest way to run LLMs anywhere☆106Updated last year
- Learning about CUDA by writing PTX code.☆145Updated last year
- Learnings and programs related to CUDA☆422Updated 4 months ago
- If tinygrad wasn't small enough for you...☆743Updated last year
- A light tensor library in zig.☆78Updated 8 months ago
- ctypes wrappers for HIP, CUDA, and OpenCL☆130Updated last year
- Nvidia Instruction Set Specification Generator☆297Updated last year
- Alex Krizhevsky's original code from Google Code☆199Updated 9 years ago
- (WIP) A small but powerful, homemade PyTorch from scratch.☆643Updated last week
- in this repository, i'm going to implement increasingly complex llm inference optimizations☆68Updated 5 months ago
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆367Updated 6 months ago
- Complete solutions to the Programming Massively Parallel Processors Edition 4☆556Updated 4 months ago
- Can you design a controller to steer a simulated car?☆307Updated 3 months ago
- Accelerated General (FP32) Matrix Multiplication from scratch in CUDA☆163Updated 9 months ago
- SIMD quantization kernels☆89Updated last month
- tiny code to access tenstorrent blackhole☆60Updated 5 months ago
- RDNA3 emulator☆54Updated 6 months ago