daemyung / practice-tritonLinks

삼각형의 실전! Triton

☆16

Alternatives and similar repositories for practice-triton

Users that are interested in practice-triton are comparing it to the libraries listed below

Sorting:

kakaobrain / trident
A performance library for machine learning applications.
☆184Updated 2 years ago
EleutherAI / oslo
OSLO: Open Source for Large-scale Optimization
☆174Updated 2 years ago
SqueezeBits / QUICK
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
☆118Updated last year
goddoe / RLYX
A hackable, simple, and reseach-friendly GRPO Training Framework with high speed weight synchronization in a multinode environment.
☆31Updated last month
xrsrke / pipegoose
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
☆87Updated last year
hpcaitech / Elixir
Elixir: Train a Large Language Model on a Small GPU Cluster
☆15Updated 2 years ago
vedantroy / gpu_kernels
☆27Updated last year
HeegyuKim / torch-xla-SPMD
Pytorch/XLA SPMD Test code in Google TPU
☆23Updated last year
swsnu / aisys2023
☆103Updated 2 years ago
IST-DASLab / Sparse-Marlin
Boosting 4-bit inference kernels with 2:4 Sparsity
☆83Updated last year
graphcore-research / pytorch-tensor-tracker
Flexibly track outputs and grad-outputs of torch.nn.Module.
☆13Updated 2 years ago
insuhan / hyper-attn
☆83Updated last year
meta-pytorch / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆223Updated last year
gpu-mode / ring-attention
ring-attention experiments
☆153Updated 11 months ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
PiotrNawrot / nano-sparse-attention
The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.
☆89Updated 2 months ago
mgmalek / efficient_cross_entropy
☆120Updated last year
itsnamgyu / block-transformer
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
☆161Updated 6 months ago
kssteven418 / BigLittleDecoder
[NeurIPS'23] Speculative Decoding with Big Little Decoder
☆94Updated last year
microsoft / mutransformers
some common Huggingface transformers in maximal update parametrization (µP)
☆82Updated 3 years ago
Infini-AI-Lab / MagicDec
[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
☆130Updated 10 months ago
google-deepmind / asyncdiloco
☆46Updated last year
amazon-science / mxfp4-llm
Official implementation for Training LLMs with MXFP4
☆96Updated 5 months ago
jaymody / speculative-sampling
Simple implementation of Speculative Sampling in NumPy for GPT-2.
☆96Updated 2 years ago
tspeterkim / mixed-precision-from-scratch
Mixed precision training from scratch with Tensors and CUDA
☆27Updated last year
siyan-zhao / prepacking
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …
☆60Updated last year
friendliai / FAI-Model
FriendliAI Model Hub
☆91Updated 3 years ago
NetEase-FuXi / EETQ
Easy and Efficient Quantization for Transformers
☆203Updated 3 months ago
DeepAuto-AI / sglang
This is a fork of SGLang for hip-attention integration. Please refer to hip-attention for detail.
☆17Updated this week
HanGuo97 / lq-lora
☆127Updated last year