gau-nernst / learn-cuda
Learn CUDA with PyTorch
☆19Updated last month
Alternatives and similar repositories for learn-cuda:
Users that are interested in learn-cuda are comparing it to the libraries listed below
- ☆21Updated 2 weeks ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆34Updated this week
- Experiment of using Tangent to autodiff triton☆78Updated last year
- A place to store reusable transformer components of my own creation or found on the interwebs☆48Updated last week
- ☆20Updated last year
- Custom triton kernels for training Karpathy's nanoGPT.☆18Updated 5 months ago
- Make triton easier☆47Updated 9 months ago
- ML/DL Math and Method notes☆58Updated last year
- An implementation of the Llama architecture, to instruct and delight☆21Updated 2 months ago
- See https://github.com/cuda-mode/triton-index/ instead!☆11Updated 10 months ago
- Collection of autoregressive model implementation☆83Updated last month
- ☆17Updated last year
- Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*☆81Updated last year
- Custom kernels in Triton language for accelerating LLMs☆18Updated 11 months ago
- ☆20Updated 11 months ago
- Various transformers for FSDP research☆37Updated 2 years ago
- ☆76Updated 8 months ago
- ring-attention experiments☆127Updated 5 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆127Updated last year
- ☆43Updated last year
- ☆46Updated last year
- ☆47Updated 6 months ago
- ☆95Updated 9 months ago
- An introduction to LLM Sampling☆77Updated 3 months ago
- extensible collectives library in triton☆84Updated 6 months ago
- Cataloging released Triton kernels.☆204Updated 2 months ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆45Updated 8 months ago