Jokeren / triton-samplesLinks
β28Updated 8 months ago
Alternatives and similar repositories for triton-samples
Users that are interested in triton-samples are comparing it to the libraries listed below
Sorting:
- extensible collectives library in tritonβ87Updated 5 months ago
- A bunch of kernels that might make stuff slower πβ59Updated this week
- Experiment of using Tangent to autodiff tritonβ81Updated last year
- β234Updated last week
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.β141Updated last year
- Write a fast kernel and run it on Discord. See how you compare against the best!β57Updated this week
- β111Updated last year
- β175Updated last year
- β41Updated last year
- Collection of kernels written in Triton languageβ154Updated 5 months ago
- β21Updated 6 months ago
- Personal solutions to the Triton Puzzlesβ20Updated last year
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.β114Updated last year
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.β299Updated this week
- Memory Optimizations for Deep Learning (ICML 2023)β107Updated last year
- Fast low-bit matmul kernels in Tritonβ365Updated this week
- This repository contains the experimental PyTorch native float8 training UXβ224Updated last year
- How to ship your LLM generated kernels to PyTorchβ49Updated this week
- β88Updated 10 months ago
- Ahead of Time (AOT) Triton Math Libraryβ76Updated last week
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β211Updated this week
- Triton-based implementation of Sparse Mixture of Experts.β239Updated 3 weeks ago
- TORCH_LOGS parser for PT2β60Updated this week
- JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM trainingβ53Updated last month
- Cataloging released Triton kernels.β257Updated last week
- Make triton easierβ47Updated last year
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mindβ¦β161Updated 2 months ago
- Triton-based Symmetric Memory operators and examplesβ28Updated 3 weeks ago
- High-Performance SGEMM on CUDA devicesβ101Updated 7 months ago
- ring-attention experimentsβ152Updated 11 months ago