(WIP) A small but powerful, homemade PyTorch from scratch.
☆676Feb 24, 2026Updated last week
Alternatives and similar repositories for magnetron
Users that are interested in magnetron are comparing it to the libraries listed below
Sorting:
- Learnings and programs related to CUDA☆433Jun 29, 2025Updated 8 months ago
- small auto-grad engine inspired from Karpathy's micrograd and PyTorch☆275Nov 21, 2024Updated last year
- Minimalistic 4D-parallelism distributed training framework for education purpose☆2,090Aug 26, 2025Updated 6 months ago
- NanoGPT (124M) in 2 minutes☆4,679Updated this week
- UNet diffusion model in pure CUDA☆657Jun 28, 2024Updated last year
- pytorch from scratch in pure C/CUDA and python☆42Oct 10, 2024Updated last year
- This series will take you on a journey from the fundamentals of NLP and Computer Vision to the cutting edge of Vision-Language Models.☆1,158Jan 23, 2025Updated last year
- code for training & evaluating Contextual Document Embedding models☆201May 14, 2025Updated 9 months ago
- A 120-day CUDA learning plan covering daily concepts, exercises, pitfalls, and references (including “Programming Massively Parallel Proc…☆869Mar 29, 2025Updated 11 months ago
- Efficient Triton Kernels for LLM Training☆6,162Updated this week
- a tiny multidimensional array implementation in C similar to numpy, but only one file.☆225Aug 2, 2024Updated last year
- Best practices & guides on how to write distributed pytorch training code☆581Oct 22, 2025Updated 4 months ago
- prime is a framework for efficient, globally distributed training of AI models over the internet.☆851Nov 16, 2025Updated 3 months ago
- Port of Andrej Karpathy's nanoGPT to Apple MLX framework.☆117Feb 12, 2024Updated 2 years ago
- Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.☆4,752Jul 18, 2025Updated 7 months ago
- ☆23Jan 5, 2025Updated last year
- learning & making kernels in cuda / triton☆22Aug 24, 2025Updated 6 months ago
- creating a tiny tensor library in raw C☆1,313Mar 5, 2025Updated 11 months ago
- A repository consisting of paper/architecture replications of classic/SOTA AI/ML papers in pytorch☆405Nov 11, 2025Updated 3 months ago
- You like pytorch? You like micrograd? You love tinygrad! ❤️☆31,471Updated this week
- LLM training in simple, raw C/CUDA☆28,993Jun 26, 2025Updated 8 months ago
- OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training☆562Jan 13, 2025Updated last year
- SIMD quantization kernels☆93Sep 7, 2025Updated 5 months ago
- Minimalistic large language model 3D-parallelism training☆2,579Feb 19, 2026Updated last week
- llama3 implementation one matrix multiplication at a time☆15,243May 23, 2024Updated last year
- DeMo: Decoupled Momentum Optimization☆198Dec 2, 2024Updated last year
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆478Feb 3, 2026Updated last month
- Solidity contracts for the decentralized Prime Network protocol☆26Jul 6, 2025Updated 7 months ago
- Entropy Based Sampling and Parallel CoT Decoding☆3,434Nov 13, 2024Updated last year
- Getting crystal-like representations with harmonic loss☆194Apr 2, 2025Updated 11 months ago
- A PyTorch native platform for training generative AI models☆5,098Updated this week
- LLM training in simple, raw C/CUDA☆15Dec 5, 2024Updated last year
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆162Oct 19, 2023Updated 2 years ago
- Benchmark your GPU with ease☆30Dec 27, 2025Updated 2 months ago
- High Quality Resources on GPU Programming/Architecture☆593Jul 26, 2024Updated last year
- Training Large Language Model to Reason in a Continuous Latent Space☆1,522Aug 12, 2025Updated 6 months ago
- This repo has all the basic things you'll need in-order to understand complete vision transformer architecture and its various implementa…☆228Jan 2, 2025Updated last year
- RS-IMLE☆44Dec 7, 2024Updated last year
- GPU programming related news and material links☆1,997Sep 17, 2025Updated 5 months ago