triton-lang / tritonLinks

Development repository for the Triton language and compiler

☆16,320

Alternatives and similar repositories for triton

Users that are interested in triton are comparing it to the libraries listed below

Sorting:

openxla / xla
A machine learning compiler for GPUs, CPUs, and ML accelerators
☆3,370Updated last week
NVIDIA / Megatron-LM
Ongoing research training transformer models at scale
☆13,010Updated this week
NVIDIA / FasterTransformer
Transformer related optimization, including BERT, GPT
☆6,255Updated last year
facebookresearch / xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
☆9,772Updated this week
ggml-org / ggml
Tensor library for machine learning
☆12,883Updated this week
Dao-AILab / flash-attention
Fast and memory-efficient exact attention
☆18,551Updated this week
jax-ml / jax
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
☆32,916Updated this week
NVIDIA / cutlass
CUDA Templates for Linear Algebra Subroutines
☆8,113Updated this week
triton-inference-server / server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
☆9,531Updated this week
NVIDIA / TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizati…
☆11,125Updated this week
google / flax
Flax is a neural network library for JAX that is designed for flexibility.
☆6,712Updated this week
huggingface / accelerate
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (i…
☆8,971Updated last week
bitsandbytes-foundation / bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
☆7,400Updated last week
facebookincubator / AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…
☆4,662Updated this week
pytorch-labs / gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
☆6,036Updated 3 months ago
facebookresearch / fairscale
PyTorch extensions for high performance and large scale training.
☆3,346Updated 3 months ago
NVIDIA / TensorRT
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source compone…
☆11,912Updated this week
huggingface / trl
Train transformer language models with reinforcement learning.
☆14,736Updated this week
sgl-project / sglang
SGLang is a fast serving framework for large language models and vision language models.
☆16,386Updated this week
huggingface / safetensors
Simple, safe way to store and distribute tensors
☆3,356Updated 3 weeks ago
karpathy / llama2.c
Inference Llama 2 in one file of pure C
☆18,582Updated 11 months ago
NVIDIA / TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Bla…
☆2,587Updated this week
karpathy / minGPT
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
☆22,341Updated 11 months ago
flashinfer-ai / flashinfer
FlashInfer: Kernel Library for LLM Serving
☆3,409Updated last week
vllm-project / vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆53,220Updated this week
jzhang38 / TinyLlama
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
☆8,667Updated last year
karpathy / nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆43,152Updated 7 months ago
HazyResearch / ThunderKittens
Tile primitives for speedy kernels
☆2,532Updated this week
iree-org / iree
A retargetable MLIR-based machine learning compiler and runtime toolkit.
☆3,241Updated this week
apache / tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
☆12,477Updated this week