jalexine / gpucodesLinks
codes documenting my gpu learning journey
☆57Updated last week
Alternatives and similar repositories for gpucodes
Users that are interested in gpucodes are comparing it to the libraries listed below
Sorting:
- ☆261Updated 3 weeks ago
- A fully functional and simple Machine Learning library made entirely from scratch with Python.☆323Updated 2 weeks ago
- Here's all my Python/Numba (CUDA) code for the encoder block I made :)☆68Updated 7 months ago
- pytorch from scratch in pure C/CUDA and python☆41Updated last year
- Notes and exploration code for learning about AI/ML☆199Updated this week
- ☆398Updated 7 months ago
- ☆96Updated last month
- (WIP) A small but powerful, homemade PyTorch from scratch.☆660Updated this week
- Learning about CUDA by writing PTX code.☆147Updated last year
- Implementations of Papers that I read, you can read my breakdown in my blog☆88Updated last month
- CUDA tutorials for Maths & ML tutorials with examples, covers multi-gpus, fused attention, winograd convolution, reinforcement learning.☆200Updated 5 months ago
- Will write CUDA for 100 days☆35Updated 6 months ago
- Learnings and programs related to CUDA☆426Updated 5 months ago
- Complete solutions to the Programming Massively Parallel Processors Edition 4☆589Updated 5 months ago
- CPU inference for the DeepSeek family of large language models in C++☆314Updated last month
- ☆45Updated 6 months ago
- ☆168Updated last year
- in this repository, i'm going to implement increasingly complex llm inference optimizations☆70Updated 6 months ago
- Educational implementation of a small GPT model from scratch in a single Jupyter Notebook☆116Updated 9 months ago
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆367Updated 7 months ago
- ☆46Updated 8 months ago
- GPU documentation for humans☆413Updated last week
- LLM training in simple, raw C/CUDA☆108Updated last year
- Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O☆535Updated 2 months ago
- Static suckless single batch CUDA-only qwen3-0.6B mini inference engine☆513Updated 2 months ago
- Model Activity Visualiser☆519Updated 7 months ago
- ☆459Updated 3 months ago
- Setting up Vscode to work with Pytorch in C/C++ with CUDA support☆25Updated 9 months ago
- Quantized LLM training in pure CUDA/C++.☆218Updated this week
- An interactive web-based demonstration of fundamental tabular Reinforcement Learning (RL) algorithms in a simple grid world environment.☆85Updated 5 months ago