Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)
β166Nov 25, 2025Updated 6 months ago
Alternatives and similar repositories for PyNorch
Users that are interested in PyNorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- SpeechPlus: Small LLM-Based Text-to-Speech Library πβ21May 20, 2025Updated last year
- Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.β19Feb 9, 2026Updated 4 months ago
- Machine translation with tinygradβ19Apr 7, 2024Updated 2 years ago
- Alex Krizhevsky's original code from Google Codeβ199Mar 10, 2016Updated 10 years ago
- β46May 24, 2025Updated last year
- Managed Kubernetes at scale on DigitalOcean β’ AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- UNet diffusion model in pure CUDAβ659Jun 28, 2024Updated last year
- This repo implements and trains DallE-1 on a synthetically generated dataset which has colored mnist images on texture/solid background aβ¦β14Oct 30, 2024Updated last year
- A std::execution style runtime context and High Performance RPC Transport for using OpenUCX. Including CUDA/ROCM/... devices with RDMA.β33May 26, 2026Updated 2 weeks ago
- A really tiny autograd engineβ100May 26, 2025Updated last year
- High-Performance FP32 GEMM on CUDA devicesβ125Jan 21, 2025Updated last year
- Implementation of FlashAttention (FA1-FA4) in PyTorch for educational and algorithmic clarityβ219Apr 12, 2026Updated last month
- Conformer block with Rotary Position Embedding, modified from lucidrains' implementβ19Sep 13, 2024Updated last year
- The official evaluation suite and dynamic data release for MixEval.β11Sep 23, 2024Updated last year
- High Performance Int8 GEMM Kernels for SM80 and later GPUs.β23Mar 11, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Comprehensive CUDA tutorials for Maths & ML with examplesβ227Jun 11, 2025Updated last year
- Optimize tensor program fast with Felix, a gradient descent autotuner.β33Mar 5, 2026Updated 3 months ago
- My submission for the GPUMODE/AMD fp8 mm challengeβ29Jun 4, 2025Updated last year
- Fast GPU based tensor core reductionsβ13Jan 13, 2023Updated 3 years ago
- A zero-to-one guide on scaling modern transformers with n-dimensional parallelism.β125Dec 29, 2025Updated 5 months ago
- β17Oct 5, 2024Updated last year
- TACOS: [T]opology-[A]ware [Co]llective Algorithm [S]ynthesizer for Distributed Machine Learningβ36Jun 13, 2025Updated 11 months ago
- NVSHMEMβTutorial: Build a DeepEPβlike GPU Bufferβ192Feb 11, 2026Updated 4 months ago
- C Compiler written in Kotlinβ13Apr 19, 2024Updated 2 years ago
- Managed Database hosting by DigitalOcean β’ AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- π Dive into Big Model Trainingβ117Dec 1, 2022Updated 3 years ago
- Minimalistic 4D-parallelism distributed training framework for education purposeβ2,216Aug 26, 2025Updated 9 months ago
- A Easy-to-understand TensorOp Matmul Tutorialβ440Mar 5, 2026Updated 3 months ago
- A zero-config OpenAI client with support for 20+ providers, API key rotation, rate limits, optional LangChain integration and more.β19Dec 11, 2025Updated 5 months ago
- flash attention tutorial written in python, triton, cuda, cutlassβ521Jan 20, 2026Updated 4 months ago
- β22May 26, 2025Updated last year
- A collection of reusable, high-performance, well-documented, thorough-tested layers and models in Jaxβ24Jun 8, 2025Updated last year
- Text Normalization utilities for normalizing text for TTSβ22Mar 4, 2026Updated 3 months ago
- β17Jan 22, 2023Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A zero-dependency ML framework in C with a modern Python API for full control over execution and memory.β687Updated this week
- A Straightforward, Step-by-Step Implementation of a Video Diffusion Modelβ84Aug 18, 2025Updated 9 months ago
- β13Jan 16, 2025Updated last year
- Efficiently discovering algorithms via LLMs with evolutionary search and reinforcement learning.β17Apr 22, 2025Updated last year
- β12Sep 25, 2024Updated last year
- A C11 compiler for the discrete logic computerβ21Apr 3, 2024Updated 2 years ago
- Implementation of the paper on Embodiment Scaling Laws in Robot Locomotion (CoRL 2025)β26Sep 23, 2025Updated 8 months ago