thevasudevgupta / gpt-tritonLinks

Triton implementation of GPT/LLAMA

☆20

Alternatives and similar repositories for gpt-triton

Users that are interested in gpt-triton are comparing it to the libraries listed below

Sorting:

MekkCyber / TritonAcademy
A repository to unravel the language of GPUs, making their kernel conversations easy to understand
☆193Updated 4 months ago
gau-nernst / learn-cuda
Learn CUDA with PyTorch
☆92Updated last month
siboehm / ShallowSpeed
Small scale distributed training of sequential deep learning models, built on Numpy and MPI.
☆146Updated 2 years ago
gpu-mode / profiling-cuda-in-torch
☆174Updated last year
hkproj / triton-flash-attention
☆209Updated 9 months ago
gpu-mode / ring-attention
ring-attention experiments
☆155Updated last year
huggingface / kernels
Load compute kernels from the Hub
☆304Updated last week
tspeterkim / mixed-precision-from-scratch
Mixed precision training from scratch with Tensors and CUDA
☆28Updated last year
gpu-mode / triton-index
Cataloging released Triton kernels.
☆263Updated last month
MekkCyber / CutlassAcademy
A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS
☆233Updated 5 months ago
huggingface / picotron_tutorial
☆224Updated last week
unixpickle / learn-ptx
Learning about CUDA by writing PTX code.
☆145Updated last year
linjames0 / Transformer-CUDA
An implementation of the transformer architecture onto an Nvidia CUDA kernel
☆191Updated 2 years ago
vdesai2014 / inference-optimization-blog-post
☆89Updated last year
cloneofsimo / ptx-tutorial-by-aislop
PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)
☆66Updated 7 months ago
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆157Updated 6 months ago
dropbox / gemlite
Fast low-bit matmul kernels in Triton
☆385Updated last week
IST-DASLab / llmq
Quantized LLM training in pure CUDA/C++.
☆206Updated last week
huggingface / kernel-builder
👷 Build compute kernels
☆163Updated last week
meta-pytorch / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆223Updated last year
wolfecameron / nanoMoE
An extension of the nanoGPT repository for training small MOE models.
☆202Updated 7 months ago
rkinas / triton-resources
A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.
☆421Updated 7 months ago
changjonathanc / flex-nano-vllm
FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.
☆300Updated 2 months ago
cloneofsimo / min-fsdp
☆91Updated last year
NVIDIA / ngpt
Normalized Transformer (nGPT)
☆192Updated 11 months ago
foundation-model-stack / fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…
☆270Updated 3 months ago
naklecha / llm-inference-optimizations-explained
in this repository, i'm going to implement increasingly complex llm inference optimizations
☆68Updated 5 months ago
VikParuchuri / triton_tutorial
Tutorials for Triton, a language for writing gpu kernels
☆55Updated 2 years ago
Deep-Learning-Profiling-Tools / triton-viz
☆242Updated this week
gpu-mode / discord-cluster-manager
Write a fast kernel and run it on Discord. See how you compare against the best!
☆58Updated 2 weeks ago