jalexine / gpucodesLinks
codes documenting my gpu learning journey
☆77Updated last month
Alternatives and similar repositories for gpucodes
Users that are interested in gpucodes are comparing it to the libraries listed below
Sorting:
- Here's all my Python/Numba (CUDA) code for the encoder block I made :)☆71Updated 9 months ago
- pytorch from scratch in pure C/CUDA and python☆41Updated last year
- GPU documentation for humans☆518Updated 2 weeks ago
- Learning about CUDA by writing PTX code.☆152Updated last year
- Complete solutions to the Programming Massively Parallel Processors Edition 4☆658Updated 7 months ago
- ☆42Updated last year
- ☆120Updated 2 months ago
- ☆415Updated 10 months ago
- Quantized LLM training in pure CUDA/C++.☆238Updated 3 weeks ago
- LLM training in simple, raw C/CUDA☆112Updated last year
- A C++ port of karpathy/llm.c features a tiny torch library while maintaining overall simplicity.☆42Updated last year
- Some CUDA example code with READMEs.☆179Updated 3 months ago
- A character-level language diffusion model trained on Tiny Shakespeare☆851Updated 3 weeks ago
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆377Updated 9 months ago
- (WIP) A small but powerful, homemade PyTorch from scratch.☆674Updated last week
- in this repository, i'm going to implement increasingly complex llm inference optimizations☆81Updated 8 months ago
- A PyTorch implementation of the GPT-OSS-20B architecture. All components are coded from scratch: RoPE with YaRN, RMSNorm, SwiGLU with cla…☆204Updated 2 months ago
- Visualization of cache-optimized matrix multiplication☆157Updated 10 months ago
- A comprehensive systems programming toolkit implementing low-level concepts in C, from memory management to OS internals. Features practi…☆73Updated 11 months ago
- Learnings and programs related to CUDA☆432Updated 7 months ago
- PyTorch memory allocation visualizer☆67Updated 6 months ago
- Supercomputing for Artificial Intelligence☆57Updated 3 weeks ago
- Setting up Vscode to work with Pytorch in C/C++ with CUDA support☆25Updated last year
- Inference Llama 2 in C++☆43Updated last year
- my little linear algebra library☆43Updated last year
- Simple MPI implementation for prototyping or learning☆300Updated 6 months ago
- a teaching deep learning framework: the bridge from micrograd to tinygrad☆53Updated this week
- Neural network in C for recognizing american sign language(ASL) from scratch on the MNIST dataset. Optimized with parallel training. Cann…☆38Updated last year
- A tinycompiler in C from scratch☆109Updated last year
- Alex Krizhevsky's original code from Google Code☆199Updated 9 years ago