apoorvnandan / lilgrad
pytorch from scratch in pure C/CUDA and python
☆34Updated last month
Related projects ⓘ
Alternatives and complementary repositories for lilgrad
- small auto-grad engine inspired from Karpathy's micrograd and PyTorch☆95Updated 4 months ago
- Simple Byte pair Encoding mechanism used for tokenization process . written purely in C☆120Updated 4 months ago
- ☆47Updated 3 months ago
- The Tensor (or Array)☆408Updated 3 months ago
- Following master Karpathy with GPT-2 implementation and training, writing lots of comments cause I have memory of a goldfish☆167Updated 3 months ago
- Andrej Kapathy's micrograd implemented in c☆29Updated 3 months ago
- A really tiny autograd engine☆87Updated 7 months ago
- Nvidia Instruction Set Specification Generator☆215Updated 4 months ago
- Solve puzzles to improve your tinygrad skills!☆87Updated last month
- Fast, Multi-threaded Matrix Multiplication in C☆181Updated 3 weeks ago
- A minimal Tensor Processing Unit (TPU) inspired by Google's TPUv1.☆116Updated 3 months ago
- A MNIST neural network written from scratch in Odin, visualised with Raylib☆158Updated last month
- my little linear algebra library☆44Updated 4 months ago
- Port of Karpathy's micrograd in pure C. Micrograd is a tiny scalar-valued autograd engine and a neural net library on top of it with PyTo…☆27Updated 3 months ago
- parallelized hyperdimensional tictactoe☆110Updated 2 months ago
- UNet diffusion model in pure CUDA☆567Updated 4 months ago
- a Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization in pure C.☆21Updated 4 months ago
- a tiny multidimensional array implementation in C similar to numpy, but only one file.☆216Updated 3 months ago
- machine learning from absolute scratch in c. gradients, linear algebra ops & everything else without using any third party library!☆21Updated 3 months ago
- Alex Krizhevsky's original code from Google Code☆188Updated 8 years ago
- This repo has all the basic things you'll need in-order to understand complete vision transformer architecture and its various implementa…☆169Updated last month
- LLM training in simple, raw C/CUDA☆86Updated 6 months ago
- Because tinygrad got out of hand with line count☆143Updated 3 weeks ago
- Notes on "Programming Massively Parallel Processors" by Hwu, Kirk, and Hajj (4th ed.)☆51Updated 3 months ago
- could we make an ml stack in 100,000 lines of code?☆25Updated 3 months ago
- port of Andrjey Karpathy's llm.c to Mojo☆321Updated 3 weeks ago
- Tensor library with autograd using only Rust's standard library☆62Updated 4 months ago
- creating a tiny tensor library in raw C☆529Updated 3 weeks ago
- Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)☆113Updated 5 months ago
- a highly efficient compression algorithm for the n1 implant (neuralink's compression challenge)☆45Updated 5 months ago