small auto-grad engine inspired from Karpathy's micrograd and PyTorch
☆275Nov 21, 2024Updated last year
Alternatives and similar repositories for smolgrad
Users that are interested in smolgrad are comparing it to the libraries listed below
Sorting:
- a tiny multidimensional array implementation in C similar to numpy, but only one file.☆226Aug 2, 2024Updated last year
- Learnings and programs related to CUDA☆434Jun 29, 2025Updated 8 months ago
- a tiny vectorstore implementation built with numpy.☆64Apr 26, 2024Updated last year
- A repository consisting of paper/architecture replications of classic/SOTA AI/ML papers in pytorch☆408Nov 11, 2025Updated 3 months ago
- learningggggggg 🐳☆576Apr 2, 2025Updated 11 months ago
- (WIP) A small but powerful, homemade PyTorch from scratch.☆676Feb 24, 2026Updated last week
- Simple Transformer in Jax☆143Jun 22, 2024Updated last year
- This repo has all the basic things you'll need in-order to understand complete vision transformer architecture and its various implementa…☆229Jan 2, 2025Updated last year
- High Quality Resources on GPU Programming/Architecture☆593Jul 26, 2024Updated last year
- From the Tensor to Stable Diffusion, a rough outline for a 1 week course.☆1,074Oct 5, 2025Updated 5 months ago
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆62Nov 4, 2024Updated last year
- parallelized hyperdimensional tictactoe☆126Aug 25, 2024Updated last year
- just me trying to implement deep learning concepts in code☆213Nov 8, 2025Updated 3 months ago
- A browser extension that demos Gemini Nano via window.ai and Cartesia TTS ⚡️☆38Jul 10, 2024Updated last year
- ☆27Jul 9, 2024Updated last year
- A deep-dive on the entire history of deep-learning☆1,538Jul 16, 2024Updated last year
- Andrej Kapathy's micrograd implemented in c☆30Aug 7, 2024Updated last year
- An ML Systems Onboarding list☆1,000Feb 19, 2026Updated 2 weeks ago
- my little linear algebra library☆43Jul 7, 2024Updated last year
- UNet diffusion model in pure CUDA☆657Jun 28, 2024Updated last year
- My submission for the GPUMODE/AMD fp8 mm challenge☆29Jun 4, 2025Updated 9 months ago
- High-Performance FP32 GEMM on CUDA devices☆117Jan 21, 2025Updated last year
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆461Mar 10, 2025Updated 11 months ago
- NanoGPT (124M) in 2 minutes☆4,734Feb 27, 2026Updated last week
- A platform aimed at creating websites that perform self-optimization☆12May 4, 2024Updated last year
- Entropy Based Sampling and Parallel CoT Decoding☆3,434Nov 13, 2024Updated last year
- Official codebase for our paper "Do Language Models Use Their Depth Efficiently?"☆29Jun 25, 2025Updated 8 months ago
- ☆12Aug 26, 2025Updated 6 months ago
- Changes in this fork has been merged to upstream.☆16Jun 10, 2025Updated 8 months ago
- I like to learn new things☆10Feb 28, 2026Updated last week
- Pull high-quality, efficient embeddings for PubMed, arXiv and Wikipedia from Huggingface and use for local LLM inference/Retrieval Augmen…☆47Feb 16, 2024Updated 2 years ago
- Code for "What really matters in matrix-whitening optimizers?"☆22Oct 31, 2025Updated 4 months ago
- Hand-Rolled GPU communications library☆86Nov 25, 2025Updated 3 months ago
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆379Apr 21, 2025Updated 10 months ago
- Deploying a Machine Learning model streaming application with Apache Kafka☆11Aug 21, 2022Updated 3 years ago
- image captioninggg🐳☆12Aug 30, 2024Updated last year
- Whalegrad 🐳 is a lightweight deep learning library written in C.☆10Jan 5, 2025Updated last year
- End to End Machine Learning Pipeline with scikit learn☆12Mar 10, 2021Updated 4 years ago
- Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.☆17Feb 9, 2026Updated 3 weeks ago