lucidrains / triton-transformerLinks

Implementation of a Transformer, but completely in Triton

☆273

Alternatives and similar repositories for triton-transformer

Users that are interested in triton-transformer are comparing it to the libraries listed below

Sorting:

lucidrains / flash-attention-jax
Implementation of Flash Attention in Jax
☆215Updated last year
pytorch-labs / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆224Updated last year
google / aqt
☆323Updated last month
mgmalek / efficient_cross_entropy
☆114Updated last year
pytorch / torchsnapshot
A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…
☆158Updated last month
pytorch / torchdistx
Torch Distributed Experimental
☆117Updated last year
google / praxis
☆187Updated this week
graphcore-research / unit-scaling
A library for unit scaling in PyTorch
☆128Updated 3 weeks ago
jundaf2 / INT8-Flash-Attention-FMHA-Quantization
☆158Updated last year
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆206Updated last week
microsoft / varuna
☆251Updated last year
BobMcDear / attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
☆565Updated this week
stanford-futuredata / stk
☆107Updated 11 months ago
google / flaxformer
☆361Updated last year
pytorch-labs / applied-ai
Applied AI experiments and examples for PyTorch
☆289Updated 2 months ago
jax-ml / jax-triton
jax-triton contains integrations between JAX and OpenAI Triton
☆411Updated last month
AminRezaei0x443 / memory-efficient-attention
Memory Efficient Attention (O(sqrt(n)) for Jax and PyTorch
☆184Updated 2 years ago
foundation-model-stack / fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…
☆258Updated last week
shawntan / scattermoe
Triton-based implementation of Sparse Mixture of Experts.
☆230Updated 8 months ago
pytorch / rfcs
PyTorch RFCs (experimental)
☆133Updated 2 months ago
epfml / dynamic-sparse-flash-attention
☆147Updated 2 years ago
Deep-Learning-Profiling-Tools / triton-viz
☆227Updated last week
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆79Updated last year
albanD / subclass_zoo
☆171Updated last year
ayaka14732 / llama-2-jax
JAX implementation of the Llama 2 model
☆219Updated last year
lucidrains / flash-cosine-sim-attention
Implementation of fused cosine similarity attention in the same style as Flash Attention
☆214Updated 2 years ago
mobiusml / gemlite
Fast low-bit matmul kernels in Triton
☆338Updated last week
gpu-mode / profiling-cuda-in-torch
☆162Updated last year
nshepperd / flash_attn_jax
JAX bindings for Flash Attention v2
☆90Updated last week
huggingface / kernels
Load compute kernels from the Hub
☆220Updated this week