mikex86 / tritoncLinks

Standalone commandline CLI tool for compiling Triton kernels

☆20

Alternatives and similar repositories for tritonc

Users that are interested in tritonc are comparing it to the libraries listed below

Sorting:

UmerHA / triton_util
Make triton easier
☆48Updated last year
drisspg / transformer_nuggets
A place to store reusable transformer components of my own creation or found on the interwebs
☆62Updated last month
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
facebookresearch / fastgen
Simple high-throughput inference library
☆149Updated 6 months ago
lianakoleva / no-libtorch-compile
☆21Updated 8 months ago
GindaChen / FlexFlashAttention3
FlexAttention w/ FlashAttention3 Support
☆27Updated last year
graphcore-research / out-of-the-box-fp8-training
Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.
☆46Updated last year
nunoplopes / torchy
A tracing JIT compiler for PyTorch
☆13Updated 3 years ago
PrimeIntellect-ai / pccl
PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP
☆138Updated 2 months ago
EleutherAI / training-jacobian
☆24Updated 11 months ago
Michaelvll / llm-ie-benchmarks
A collection of reproducible inference engine benchmarks
☆37Updated 7 months ago
joey00072 / microjax
Jax like function transformation engine but micro, microjax
☆33Updated last year
eth-easl / fmengine
Utilities for Training Very Large Models
☆58Updated last year
PrimeIntellect-ai / pi-quant
SIMD quantization kernels
☆92Updated 2 months ago
WaveSpeedAI / QuantumAttention
[WIP] Better (FP8) attention for Hopper
☆32Updated 8 months ago
kaiokendev / cutoff-len-is-context-len
Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit
☆62Updated 2 years ago
Dao-AILab / gemm-cublas
☆23Updated 6 months ago
cloneofsimo / zeroshampoo
☆34Updated last year
codekansas / rwkv
RWKV model implementation
☆38Updated 2 years ago
Jokeren / triton-samples
☆28Updated 10 months ago
gpu-mode / discord-cluster-manager
Write a fast kernel and run it on Discord. See how you compare against the best!
☆61Updated last week
cloneofsimo / min-fsdp
☆91Updated last year
ezyang / torchdbg
PyTorch centric eager mode debugger
☆48Updated 11 months ago
HazyResearch / train-tk
train with kittens!
☆63Updated last year
thecharlieblake / lovely-llama
An implementation of the Llama architecture, to instruct and delight
☆21Updated 5 months ago
cloneofsimo / ptx-tutorial-by-aislop
PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)
☆66Updated 7 months ago
haileyschoelkopf / triton-index
See https://github.com/cuda-mode/triton-index/ instead!
☆11Updated last year
sholtodouglas / scalingExperiments
☆62Updated 3 years ago
Narsil / bloomserver
☆39Updated 3 years ago
lernapparat / torchhacks
Hacks for PyTorch
☆19Updated 2 years ago