thevasudevgupta / gpt-triton
Triton implementation of GPT/LLAMA
☆18Updated 8 months ago
Alternatives and similar repositories for gpt-triton
Users that are interested in gpt-triton are comparing it to the libraries listed below
Sorting:
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆180Updated this week
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆171Updated this week
- ☆155Updated last year
- ☆79Updated 10 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆131Updated last year
- ☆163Updated 4 months ago
- ☆88Updated last year
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆70Updated 11 months ago
- ring-attention experiments☆140Updated 6 months ago
- Load compute kernels from the Hub☆116Updated this week
- Code for studying the super weight in LLM☆100Updated 5 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆126Updated 5 months ago
- Collection of autoregressive model implementation☆85Updated 2 weeks ago
- VIT inference in triton because, why not?☆27Updated 11 months ago
- Hydragen: High-Throughput LLM Inference with Shared Prefixes☆36Updated last year
- Write a fast kernel and run it on Discord. See how you compare against the best!☆44Updated this week
- Cataloging released Triton kernels.☆220Updated 4 months ago
- ☆202Updated 2 weeks ago
- Deep learning library implemented from scratch in numpy. Mixtral, Mamba, LLaMA, GPT, ResNet, and other experiments.☆51Updated last year
- making the official triton tutorials actually comprehensible☆28Updated last month
- Triton-based implementation of Sparse Mixture of Experts.☆212Updated 5 months ago
- ☆184Updated 2 months ago
- Mixed precision training from scratch with Tensors and CUDA☆22Updated 11 months ago
- extensible collectives library in triton☆86Updated last month
- Fast low-bit matmul kernels in Triton☆297Updated this week
- Collection of kernels written in Triton language☆122Updated last month
- Simple and efficient pytorch-native transformer training and inference (batched)☆75Updated last year
- Prune transformer layers☆69Updated 11 months ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆60Updated 3 months ago
- Experiment of using Tangent to autodiff triton☆78Updated last year