thevasudevgupta / gpt-triton
Triton implementation of GPT/LLAMA
☆16Updated 6 months ago
Alternatives and similar repositories for gpt-triton:
Users that are interested in gpt-triton are comparing it to the libraries listed below
- ☆131Updated 2 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆125Updated last year
- Cataloging released Triton kernels.☆191Updated 2 months ago
- ☆148Updated last year
- Custom kernels in Triton language for accelerating LLMs☆17Updated 11 months ago
- ☆75Updated 8 months ago
- ☆156Updated last month
- ring-attention experiments☆127Updated 4 months ago
- ☆188Updated 3 weeks ago
- ☆86Updated last year
- Collection of autoregressive model implementation☆83Updated last month
- High-Performance SGEMM on CUDA devices☆86Updated last month
- Collection of kernels written in Triton language☆110Updated 3 weeks ago
- This repository contains the experimental PyTorch native float8 training UX☆222Updated 7 months ago
- Prune transformer layers☆68Updated 9 months ago
- Learn CUDA with PyTorch☆18Updated last month
- Experiment of using Tangent to autodiff triton☆76Updated last year
- Fast low-bit matmul kernels in Triton☆257Updated last week
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆100Updated 3 months ago
- Applied AI experiments and examples for PyTorch☆243Updated this week
- Normalized Transformer (nGPT)☆156Updated 3 months ago
- Deep learning library implemented from scratch in numpy. Mixtral, Mamba, LLaMA, GPT, ResNet, and other experiments.☆51Updated 11 months ago
- supporting pytorch FSDP for optimizers☆79Updated 3 months ago
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆227Updated last week
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆123Updated 3 months ago
- LLM training in simple, raw C/CUDA☆92Updated 10 months ago
- Complete implementation of Llama2 with/without KV cache & inference 🚀☆47Updated 9 months ago