ulrichstern / cuda-convnet
Alex Krizhevsky's original code from Google Code
☆191Updated 9 years ago
Alternatives and similar repositories for cuda-convnet:
Users that are interested in cuda-convnet are comparing it to the libraries listed below
- Learning about CUDA by writing PTX code.☆128Updated last year
- A really tiny autograd engine☆92Updated last year
- Simple Byte pair Encoding mechanism used for tokenization process . written purely in C☆129Updated 5 months ago
- UNet diffusion model in pure CUDA☆601Updated 10 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆130Updated last year
- ☆155Updated last year
- Following master Karpathy with GPT-2 implementation and training, writing lots of comments cause I have memory of a goldfish☆174Updated 9 months ago
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆180Updated last year
- The Tensor (or Array)☆432Updated 8 months ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆180Updated last week
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆536Updated last week
- Learnings and programs related to CUDA☆380Updated 2 months ago
- Solve puzzles to improve your tinygrad skills!☆123Updated last month
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated last month
- Solve puzzles. Learn CUDA.☆64Updated last year
- Simple Transformer in Jax☆136Updated 10 months ago
- Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)☆150Updated 11 months ago
- ☆202Updated last week
- ☆181Updated 2 months ago
- small auto-grad engine inspired from Karpathy's micrograd and PyTorch☆267Updated 5 months ago
- Implementation of Diffusion Transformer (DiT) in JAX☆272Updated 10 months ago
- High-Performance SGEMM on CUDA devices☆90Updated 3 months ago
- The Multilayer Perceptron Language Model☆547Updated 8 months ago
- ☆88Updated last year
- Fast bare-bones BPE for modern tokenizer training☆154Updated last month
- Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O☆290Updated 3 months ago
- ring-attention experiments☆132Updated 6 months ago
- Tutorials on tinygrad☆373Updated last month
- ☆99Updated last year
- LLM training in simple, raw C/CUDA☆94Updated last year