cloneofsimo / ptx-tutorial-by-aislop
PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)
☆65Updated last month
Alternatives and similar repositories for ptx-tutorial-by-aislop:
Users that are interested in ptx-tutorial-by-aislop are comparing it to the libraries listed below
- Learning about CUDA by writing PTX code.☆128Updated last year
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆178Updated this week
- High-Performance SGEMM on CUDA devices☆90Updated 3 months ago
- NanoGPT-speedrunning for the poor T4 enjoyers☆62Updated this week
- Write a fast kernel and run it on Discord. See how you compare against the best!☆40Updated this week
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆165Updated last month
- ☆71Updated this week
- Experimental GPU language with meta-programming☆22Updated 7 months ago
- ☆153Updated last year
- making the official triton tutorials actually comprehensible☆26Updated last month
- extensible collectives library in triton☆85Updated 3 weeks ago
- pytorch from scratch in pure C/CUDA and python☆40Updated 6 months ago
- ☆87Updated last year
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆130Updated last year
- Fast low-bit matmul kernels in Triton☆291Updated this week
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆111Updated this week
- Load compute kernels from the Hub☆115Updated this week
- Collection of autoregressive model implementation☆85Updated 2 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆72Updated 7 months ago
- ☆78Updated 9 months ago
- Make triton easier☆47Updated 10 months ago
- Experiment of using Tangent to autodiff triton☆78Updated last year
- ring-attention experiments☆130Updated 6 months ago
- ☆31Updated 3 months ago
- RWKV-7: Surpassing GPT