Deep-Learning-Profiling-Tools / triton-samplesLinks

☆14

Alternatives and similar repositories for triton-samples

Users that are interested in triton-samples are comparing it to the libraries listed below

Sorting:

meta-pytorch / tritonbench
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆294Updated this week
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆169Updated 7 months ago
Deep-Learning-Profiling-Tools / triton-viz
☆250Updated last week
triton-lang / kernels
☆94Updated last year
gpu-mode / triton-index
Cataloging released Triton kernels.
☆272Updated 2 months ago
IBM / triton-dejavu
Framework to reduce autotune overhead to zero for well known deployments.
☆88Updated 2 months ago
daniel-geon-park / triton_bwd
Automatic differentiation for Triton Kernels
☆30Updated 3 months ago
microsoft / TileFusion
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
☆102Updated 5 months ago
cchan / tccl
extensible collectives library in triton
☆91Updated 8 months ago
meta-pytorch / tritonparse
TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels
☆171Updated last week
open-lm-engine / accelerated-model-architectures
A bunch of kernels that might make stuff slower 😉
☆65Updated this week
meta-pytorch / triton-cpu
An experimental CPU backend for Triton (https//github.com/openai/triton)
☆47Updated 3 months ago
meta-pytorch / applied-ai
Applied AI experiments and examples for PyTorch
☆307Updated 3 months ago
gpu-mode / ring-attention
ring-attention experiments
☆160Updated last year
NVIDIA / jaxpp
JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM training
☆57Updated last week
vedantroy / gpu_kernels
☆27Updated last year
dropbox / gemlite
Fast low-bit matmul kernels in Triton
☆398Updated last week
HanGuo97 / hilt
☆37Updated 3 weeks ago
gau-nernst / learn-cuda
Learn CUDA with PyTorch
☆117Updated this week
ColfaxResearch / layout-categories
This repository contains companion software for the Colfax Research paper "Categorical Foundations for CuTe Layouts".
☆80Updated 2 months ago
nvixnu / pmpp__programming_massively_parallel_processors
Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…
☆75Updated 4 years ago
IST-DASLab / qutlass
QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning
☆140Updated 2 weeks ago
meta-pytorch / KernelAgent
Autonomous GPU Kernel Generation via Deep Agents
☆163Updated last week
ColfaxResearch / cutlass-kernels
☆246Updated last year
wangsiping97 / FastGEMV
High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.
☆122Updated last year
facebookexperimental / triton
Github mirror of trition-lang/triton repo.
☆98Updated last week
meta-pytorch / BackendBench
How to ensure correctness and ship LLM generated kernels in PyTorch
☆121Updated 2 weeks ago
stanford-futuredata / stk
☆113Updated last year
bertmaher / simplegemm
☆126Updated last month
NVIDIA / compute-eval
Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…
☆75Updated last week