VikParuchuri / triton_tutorialLinks
Tutorials for Triton, a language for writing gpu kernels
☆71Updated 2 years ago
Alternatives and similar repositories for triton_tutorial
Users that are interested in triton_tutorial are comparing it to the libraries listed below
Sorting:
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆195Updated 7 months ago
- Load compute kernels from the Hub☆359Updated last week
- ☆92Updated last year
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.☆328Updated 2 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆185Updated 6 months ago
- Experiment of using Tangent to autodiff triton☆81Updated last year
- ☆178Updated last year
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆225Updated 7 months ago
- Code for studying the super weight in LLM☆120Updated last year
- ☆124Updated last year
- Accelerated First Order Parallel Associative Scan☆193Updated last week
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆278Updated last month
- This repository contains the experimental PyTorch native float8 training UX☆227Updated last year
- Understand and test language model architectures on synthetic tasks.☆249Updated this week
- ☆224Updated last month
- ring-attention experiments☆161Updated last year
- Normalized Transformer (nGPT)☆195Updated last year
- Cataloging released Triton kernels.☆282Updated 4 months ago
- An extension of the nanoGPT repository for training small MOE models.☆225Updated 10 months ago
- Custom triton kernels for training Karpathy's nanoGPT.☆19Updated last year
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆91Updated 6 months ago
- Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"☆86Updated 4 months ago
- Triton-based implementation of Sparse Mixture of Experts.☆260Updated 3 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆154Updated 2 years ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆592Updated 5 months ago
- A library for unit scaling in PyTorch☆133Updated 6 months ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆244Updated 7 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆136Updated last year
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆132Updated last year
- Awesome Triton Resources☆39Updated 8 months ago