Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)
☆164Nov 25, 2025Updated 3 months ago
Alternatives and similar repositories for PyNorch
Users that are interested in PyNorch are comparing it to the libraries listed below
Sorting:
- A std::execution style runtime context and High Performance RPC Transport for using OpenUCX. Including CUDA/ROCM/... devices with RDMA.☆29Feb 22, 2026Updated 2 weeks ago
- Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.☆18Feb 9, 2026Updated last month
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆167Feb 11, 2026Updated last month
- A simple implementation of Llama 1, 2. Llama Architecture built from scratch using PyTorch all the models are built from scratch that inc…☆14May 6, 2024Updated last year
- SpeechPlus: Small LLM-Based Text-to-Speech Library 🚀☆20May 20, 2025Updated 9 months ago
- Fast GPU based tensor core reductions