alexzhang13 / Triton-Puzzles-SolutionsLinks
Personal solutions to the Triton Puzzles
β19Updated last year
Alternatives and similar repositories for Triton-Puzzles-Solutions
Users that are interested in Triton-Puzzles-Solutions are comparing it to the libraries listed below
Sorting:
- Experiment of using Tangent to autodiff tritonβ80Updated last year
- A bunch of kernels that might make stuff slower πβ56Updated last week
- β28Updated 6 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!β48Updated this week
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.β46Updated last year
- ring-attention experimentsβ146Updated 9 months ago
- β107Updated 11 months ago
- extensible collectives library in tritonβ88Updated 4 months ago
- FlexAttention w/ FlashAttention3 Supportβ27Updated 10 months ago
- Flash-Muon: An Efficient Implementation of Muon Optimizerβ149Updated last month
- Collection of kernels written in Triton languageβ142Updated 4 months ago
- JAX bindings for Flash Attention v2β91Updated last week
- A library for unit scaling in PyTorchβ128Updated 3 weeks ago
- A place to store reusable transformer components of my own creation or found on the interwebsβ59Updated last week
- The simplest but fast implementation of matrix multiplication in CUDA.β37Updated last year
- β227Updated this week
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.β212Updated this week
- β83Updated last year
- Fast and memory-efficient exact attentionβ69Updated 5 months ago
- EquiTriton is a project that seeks to implement high-performance kernels for commonly used building blocks in equivariant neural networksβ¦β62Updated this week
- Make triton easierβ47Updated last year
- β14Updated 2 months ago
- CUDA and Triton implementations of Flash Attention with SoftmaxN.β71Updated last year
- β39Updated last year
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.β199Updated this week
- This repository contains the experimental PyTorch native float8 training UXβ224Updated last year
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.β82Updated 3 weeks ago
- Triton-based implementation of Sparse Mixture of Experts.β230Updated 8 months ago
- Cataloging released Triton kernels.β247Updated 6 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.β137Updated last year