gau-nernst / quantized-trainingLinks

Explore training for quantized models

☆20

Alternatives and similar repositories for quantized-training

Users that are interested in quantized-training are comparing it to the libraries listed below

Sorting:

mobiusml / gemlite
Fast low-bit matmul kernels in Triton
☆339Updated this week
pytorch-labs / applied-ai
Applied AI experiments and examples for PyTorch
☆289Updated 2 months ago
cchan / tccl
extensible collectives library in triton
☆88Updated 4 months ago
wangsiping97 / FastGEMV
High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.
☆113Updated last year
gpu-mode / triton-index
Cataloging released Triton kernels.
☆247Updated 6 months ago
Dao-AILab / fast-hadamard-transform
Fast Hadamard transform in CUDA, with a PyTorch interface
☆213Updated last year
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆142Updated 4 months ago
open-lm-engine / flash-model-architectures
A bunch of kernels that might make stuff slower 😉
☆56Updated last week
Deep-Learning-Profiling-Tools / triton-viz
☆227Updated this week
pytorch-labs / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆224Updated last year
usyd-fsalab / fp6_llm
An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
☆260Updated 3 weeks ago
IST-DASLab / qutlass
QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning
☆57Updated 3 weeks ago
triton-lang / kernels
☆85Updated 9 months ago
stanford-futuredata / stk
☆107Updated 11 months ago
jundaf2 / INT8-Flash-Attention-FMHA-Quantization
☆158Updated last year
ColfaxResearch / cutlass-kernels
☆228Updated last year
pytorch-labs / tritonbench
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆199Updated this week
IST-DASLab / Sparse-Marlin
Boosting 4-bit inference kernels with 2:4 Sparsity
☆80Updated 11 months ago
MekkCyber / CutlassAcademy
A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS
☆205Updated 3 months ago
INT-FlashAttention2024 / INT-FlashAttention
☆79Updated 6 months ago
pytorch / helion
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
☆212Updated this week
Dao-AILab / quack
A Quirky Assortment of CuTe Kernels
☆388Updated this week
tgale96 / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆107Updated 2 months ago
tspeterkim / paged-attention-minimal
a minimal cache manager for PagedAttention, on top of llama3.
☆110Updated 11 months ago
neuralmagic / compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
☆142Updated this week
NVIDIA / online-softmax
Benchmark code for the "Online normalizer calculation for softmax" paper
☆96Updated 7 years ago
efeslab / Atom
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
☆318Updated last year
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆206Updated last week
yifuwang / symm-mem-recipes
☆102Updated 7 months ago
pranjalssh / fast.cu
Fastest kernels written from scratch
☆310Updated 4 months ago