VikParuchuri / triton_tutorialLinks

Tutorials for Triton, a language for writing gpu kernels

☆55

Alternatives and similar repositories for triton_tutorial

Users that are interested in triton_tutorial are comparing it to the libraries listed below

Sorting:

MekkCyber / TritonAcademy
A repository to unravel the language of GPUs, making their kernel conversations easy to understand
☆193Updated 4 months ago
huggingface / kernels
Load compute kernels from the Hub
☆304Updated this week
cloneofsimo / min-fsdp
☆91Updated last year
hkproj / triton-flash-attention
☆209Updated 9 months ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
wolfecameron / nanoMoE
An extension of the nanoGPT repository for training small MOE models.
☆202Updated 7 months ago
insuhan / hyper-attn
☆83Updated last year
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆166Updated 3 months ago
gpu-mode / profiling-cuda-in-torch
☆174Updated last year
mengxiayu / LLMSuperWeight
Code for studying the super weight in LLM
☆120Updated 10 months ago
nil0x9 / flash-muon
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆195Updated 4 months ago
HazyResearch / zoology
Understand and test language model architectures on synthetic tasks.
☆233Updated 3 weeks ago
mgmalek / efficient_cross_entropy
☆121Updated last year
huggingface / picotron_tutorial
☆222Updated 3 weeks ago
zaydzuhri / softpick-attention
Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"
☆85Updated last month
changjonathanc / flex-nano-vllm
FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.
☆296Updated 2 months ago
PiotrNawrot / nano-sparse-attention
The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.
☆91Updated 3 months ago
srush / annotated-mamba
Annotated version of the Mamba paper
☆489Updated last year
gpu-mode / triton-index
Cataloging released Triton kernels.
☆263Updated last month
ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆83Updated 10 months ago
open-lm-engine / flash-model-architectures
A bunch of kernels that might make stuff slower 😉
☆62Updated this week
foundation-model-stack / fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…
☆270Updated 2 months ago
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆248Updated 8 months ago
lucidrains / taylor-series-linear-attention
Explorations into the recently proposed Taylor Series Linear Attention
☆99Updated last year
NVIDIA / ngpt
Normalized Transformer (nGPT)
☆192Updated 11 months ago
proger / accelerated-scan
Accelerated First Order Parallel Associative Scan
☆189Updated last year
melisa-writer / short-transformers
Prune transformer layers
☆69Updated last year
NX-AI / mlstm_kernels
Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.
☆73Updated 2 weeks ago
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆157Updated 6 months ago
Edward-Sun / gpt-accelera
Simple and efficient pytorch-native transformer training and inference (batched)
☆78Updated last year