gau-nernst / learn-cuda
Learn CUDA with PyTorch
☆18Updated last month
Alternatives and similar repositories for learn-cuda:
Users that are interested in learn-cuda are comparing it to the libraries listed below
- ☆21Updated last week
- A place to store reusable transformer components of my own creation or found on the interwebs☆47Updated 2 weeks ago
- Collection of autoregressive model implementation☆83Updated 3 weeks ago
- Custom triton kernels for training Karpathy's nanoGPT.☆17Updated 4 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆125Updated last year
- Experiment of using Tangent to autodiff triton☆76Updated last year
- Make triton easier☆47Updated 9 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆31Updated this week
- ☆20Updated 10 months ago
- An implementation of the Llama architecture, to instruct and delight☆21Updated last month
- Cataloging released Triton kernels.☆185Updated 2 months ago
- ☆75Updated 8 months ago
- ring-attention experiments☆127Updated 4 months ago
- ☆94Updated 9 months ago
- Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*☆81Updated last year
- ML/DL Math and Method notes☆58Updated last year
- Various transformers for FSDP research☆37Updated 2 years ago
- See https://github.com/cuda-mode/triton-index/ instead!☆11Updated 10 months ago
- Collection of kernels written in Triton language☆110Updated 3 weeks ago
- ☆86Updated last year
- ☆20Updated last year
- Google TPU optimizations for transformers models☆102Updated last month
- ☆43Updated last year
- ☆27Updated 8 months ago
- Custom kernels in Triton language for accelerating LLMs☆17Updated 11 months ago
- Deep learning library implemented from scratch in numpy. Mixtral, Mamba, LLaMA, GPT, ResNet, and other experiments.☆51Updated 11 months ago
- ☆148Updated last year
- supporting pytorch FSDP for optimizers☆79Updated 3 months ago