gpu-mode / triton-tutorialsLinks
☆15Updated 5 months ago
Alternatives and similar repositories for triton-tutorials
Users that are interested in triton-tutorials are comparing it to the libraries listed below
Sorting:
- A place to store reusable transformer components of my own creation or found on the interwebs☆59Updated 2 weeks ago
 - CUDA and Triton implementations of Flash Attention with SoftmaxN.☆73Updated last year
 - Make triton easier☆48Updated last year
 - Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆45Updated last year
 - Personal solutions to the Triton Puzzles☆20Updated last year
 - Triton Implementation of HyperAttention Algorithm☆48Updated last year
 - JORA: JAX Tensor-Parallel LoRA Library (ACL 2024)☆36Updated last year
 - DPO, but faster 🚀☆45Updated 10 months ago
 - Using FlexAttention to compute attention with different masking patterns☆47Updated last year
 - ☆57Updated last year
 - Experiment of using Tangent to autodiff triton☆80Updated last year
 - Hacks for PyTorch☆19Updated 2 years ago
 - Pytorch DTensor native training library for LLMs/VLMs with OOTB Hugging Face support☆141Updated this week
 - A collection of reproducible inference engine benchmarks☆37Updated 6 months ago
 - Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆42Updated last year
 - FlexAttention w/ FlashAttention3 Support☆27Updated last year
 - Odysseus: Playground of LLM Sequence Parallelism☆78Updated last year
 - ☆91Updated last year
 - Context Manager to profile the forward and backward times of PyTorch's nn.Module☆82Updated 2 years ago
 - ring-attention experiments☆155Updated last year
 - The evaluation framework for training-free sparse attention in LLMs☆102Updated 3 weeks ago
 - Linear Attention Sequence Parallelism (LASP)☆87Updated last year
 - This code repository contains the code used for my "Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch" blog po…☆91Updated 2 years ago
 - Flash-Muon: An Efficient Implementation of Muon Optimizer☆197Updated 4 months ago
 - The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …☆60Updated last year
 - Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.☆72Updated 2 weeks ago
 - PyTorch centric eager mode debugger☆48Updated 10 months ago
 - ☆34Updated 4 months ago
 - Repository for CPU Kernel Generation for LLM Inference☆26Updated 2 years ago
 - Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).☆25Updated 3 months ago