IST-DASLab / sparsepropLinks

☆15

Alternatives and similar repositories for sparseprop

Users that are interested in sparseprop are comparing it to the libraries listed below

Sorting:

meta-pytorch / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆223Updated last year
goodevening13 / aquakv
☆16Updated this week
dropbox / gemlite
Fast low-bit matmul kernels in Triton
☆385Updated last week
graphcore-research / unit-scaling
A library for unit scaling in PyTorch
☆132Updated 3 months ago
Cornell-RelaxML / qtip
☆152Updated 4 months ago
IST-DASLab / Sparse-Marlin
Boosting 4-bit inference kernels with 2:4 Sparsity
☆84Updated last year
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆159Updated 6 months ago
PiotrNawrot / sparse-frontier
The evaluation framework for training-free sparse attention in LLMs
☆102Updated 2 weeks ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
Dao-AILab / fast-hadamard-transform
Fast Hadamard transform in CUDA, with a PyTorch interface
☆253Updated last week
IST-DASLab / QuEST
Work in progress.
☆74Updated 4 months ago
google / aqt
☆335Updated last month
facebookresearch / MODel_opt
Memory Optimizations for Deep Learning (ICML 2023)
☆110Updated last year
huggingface / kernels
Load compute kernels from the Hub
☆308Updated this week
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆215Updated last week
mengxiayu / LLMSuperWeight
Code for studying the super weight in LLM
☆119Updated 10 months ago
HazyResearch / flash-fft-conv
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
☆329Updated 10 months ago
vllm-project / compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
☆183Updated this week
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated 10 months ago
nil0x9 / flash-muon
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆197Updated 4 months ago
meta-pytorch / applied-ai
Applied AI experiments and examples for PyTorch
☆301Updated 2 months ago
meta-pytorch / superblock
A block oriented training approach for inference time optimization.
☆33Updated last year
shawntan / scattermoe
Triton-based implementation of Sparse Mixture of Experts.
☆246Updated 3 weeks ago
hahnyuan / PB-LLM
PB-LLM: Partially Binarized Large Language Models
☆156Updated last year
gpu-mode / triton-index
Cataloging released Triton kernels.
☆263Updated last month
cchan / tccl
extensible collectives library in triton
☆90Updated 6 months ago
amazon-science / mxfp4-llm
Official implementation for Training LLMs with MXFP4
☆100Updated 6 months ago
IST-DASLab / Quartet
☆103Updated this week
SqueezeBits / QUICK
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
☆118Updated last year
opengear-project / GEAR
GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
☆169Updated last year