thevasudevgupta / gpt-tritonLinks
Triton implementation of GPT/LLAMA
☆19Updated 9 months ago
Alternatives and similar repositories for gpt-triton
Users that are interested in gpt-triton are comparing it to the libraries listed below
Sorting:
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆185Updated 3 weeks ago
- ring-attention experiments☆144Updated 8 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆134Updated last year
- ☆159Updated last year
- ☆174Updated 5 months ago
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆189Updated last month
- ☆219Updated this week
- Collection of kernels written in Triton language☆132Updated 2 months ago
- Learn CUDA with PyTorch☆27Updated this week
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆70Updated last year
- ☆88Updated last year
- ☆78Updated 11 months ago
- ☆193Updated 4 months ago
- Mixed precision training from scratch with Tensors and CUDA☆24Updated last year
- The evaluation framework for training-free sparse attention in LLMs☆69Updated last week
- Simple and efficient pytorch-native transformer training and inference (batched)☆76Updated last year
- Load compute kernels from the Hub☆191Updated last week
- Cataloging released Triton kernels.☆238Updated 5 months ago
- Custom triton kernels for training Karpathy's nanoGPT.☆19Updated 8 months ago
- Official repository of Sparse ISO-FLOP Transformations for Maximizing Training Efficiency☆25Updated 10 months ago
- This repository contains the experimental PyTorch native float8 training UX☆224Updated 10 months ago
- Applied AI experiments and examples for PyTorch☆277Updated 3 weeks ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆77Updated last week
- Understand and test language model architectures on synthetic tasks.☆218Updated 2 weeks ago
- Fast low-bit matmul kernels in Triton☆322Updated last week
- LLM training in simple, raw C/CUDA☆99Updated last year
- ☆109Updated last year
- Prune transformer layers☆69Updated last year
- Write a fast kernel and run it on Discord. See how you compare against the best!☆46Updated this week
- ☆59Updated this week