pytorch-labs / float8_experimentalLinks

This repository contains the experimental PyTorch native float8 training UX

☆224

Alternatives and similar repositories for float8_experimental

Users that are interested in float8_experimental are comparing it to the libraries listed below

Sorting:

pytorch-labs / applied-ai
Applied AI experiments and examples for PyTorch
☆289Updated 2 months ago
mobiusml / gemlite
Fast low-bit matmul kernels in Triton
☆338Updated last week
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆206Updated last week
Deep-Learning-Profiling-Tools / triton-viz
☆227Updated last week
cchan / tccl
extensible collectives library in triton
☆88Updated 4 months ago
foundation-model-stack / fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…
☆258Updated last week
Dao-AILab / quack
A Quirky Assortment of CuTe Kernels
☆388Updated this week
shawntan / scattermoe
Triton-based implementation of Sparse Mixture of Experts.
☆230Updated 8 months ago
gpu-mode / triton-index
Cataloging released Triton kernels.
☆247Updated 6 months ago
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆142Updated 4 months ago
graphcore-research / unit-scaling
A library for unit scaling in PyTorch
☆128Updated 3 weeks ago
gpu-mode / ring-attention
ring-attention experiments
☆146Updated 9 months ago
pytorch-labs / tritonbench
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆199Updated this week
jundaf2 / INT8-Flash-Attention-FMHA-Quantization
☆158Updated last year
BobMcDear / attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
☆565Updated last week
stanford-futuredata / stk
☆107Updated 11 months ago
pytorch-labs / helion
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
☆200Updated this week
huggingface / kernels
Load compute kernels from the Hub
☆220Updated this week
google / aqt
☆323Updated this week
Dao-AILab / fast-hadamard-transform
Fast Hadamard transform in CUDA, with a PyTorch interface
☆213Updated last year
gpu-mode / profiling-cuda-in-torch
☆162Updated last year
tgale96 / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆107Updated 2 months ago
mgmalek / efficient_cross_entropy
☆114Updated last year
HazyResearch / Megakernels
kernels, of the mega variety
☆466Updated 2 months ago
neuralmagic / compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
☆142Updated this week
open-lm-engine / flash-model-architectures
A bunch of kernels that might make stuff slower 😉
☆56Updated this week
SqueezeBits / QUICK
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
☆118Updated last year
lucidrains / triton-transformer
Implementation of a Transformer, but completely in Triton
☆273Updated 3 years ago
usyd-fsalab / fp6_llm
An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
☆260Updated 3 weeks ago
facebookresearch / HolisticTraceAnalysis
A library to analyze PyTorch traces.
☆400Updated last week