lucasdelimanogueira / PyNorchLinks

Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)

☆160

Alternatives and similar repositories for PyNorch

Users that are interested in PyNorch are comparing it to the libraries listed below

Sorting:

hkproj / triton-flash-attention
☆209Updated 9 months ago
gau-nernst / learn-cuda
Learn CUDA with PyTorch
☆92Updated last month
gpu-mode / profiling-cuda-in-torch
☆174Updated last year
rkinas / triton-resources
A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.
☆421Updated 7 months ago
HenryNdubuaku / cuda-tutorials
CUDA tutorials for Maths & ML tutorials with examples, covers multi-gpus, fused attention, winograd convolution, reinforcement learning.
☆196Updated 4 months ago
wolfecameron / nanoMoE
An extension of the nanoGPT repository for training small MOE models.
☆202Updated 7 months ago
MekkCyber / TritonAcademy
A repository to unravel the language of GPUs, making their kernel conversations easy to understand
☆193Updated 4 months ago
hkproj / pytorch-llama
LLaMA 2 implemented from scratch in PyTorch
☆358Updated 2 years ago
Quentin-Anthony / nanoMPI
Simple MPI implementation for prototyping or learning
☆286Updated 2 months ago
evintunador / triton_docs_tutorials
making the official triton tutorials actually comprehensible
☆57Updated 2 months ago
gpu-mode / ring-attention
ring-attention experiments
☆154Updated last year
gpu-mode / triton-index
Cataloging released Triton kernels.
☆263Updated last month
linjames0 / Transformer-CUDA
An implementation of the transformer architecture onto an Nvidia CUDA kernel
☆191Updated 2 years ago
MekkCyber / CutlassAcademy
A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS
☆233Updated 5 months ago
ulrichstern / cuda-convnet
Alex Krizhevsky's original code from Google Code
☆199Updated 9 years ago
huggingface / picotron_tutorial
☆222Updated 3 weeks ago
BobMcDear / attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
☆580Updated 2 months ago
thevasudevgupta / gpt-triton
Triton implementation of GPT/LLAMA
☆20Updated last year
a-hamdi / GPU
100 days of building GPU kernels!
☆519Updated 6 months ago
lessw2020 / triton_kernels_for_fun_and_profit
Custom kernels in Triton language for accelerating LLMs
☆26Updated last year
EurekaLabsAI / tensor
The Tensor (or Array)
☆451Updated last year
aryagxr / cuda
coding CUDA everyday!
☆64Updated 6 months ago
ash-01xor / bpe.c
Simple Byte pair Encoding mechanism used for tokenization process . written purely in C
☆137Updated 11 months ago
SzymonOzog / GPU_Programming
☆80Updated last month
andrewkchan / yalm
Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O
☆518Updated last month
hkproj / 100-days-of-gpu
☆384Updated 6 months ago
dropbox / gemlite
Fast low-bit matmul kernels in Triton
☆385Updated last week
Maharshi-Pandya / cudacodes
Learnings and programs related to CUDA
☆422Updated 3 months ago
changjonathanc / flex-nano-vllm
FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.
☆296Updated 2 months ago
cloneofsimo / ptx-tutorial-by-aislop
PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)
☆66Updated 7 months ago