HazyResearch / flyLinks

☆209

Alternatives and similar repositories for fly

Users that are interested in fly are comparing it to the libraries listed below

Sorting:

epfml / dynamic-sparse-flash-attention
☆147Updated 2 years ago
WoosukKwon / retraining-free-pruning
[NeurIPS 2022] A Fast Post-Training Pruning Framework for Transformers
☆190Updated 2 years ago
IST-DASLab / OBC
Code for the NeurIPS 2022 paper "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning".
☆121Updated 2 years ago
kssteven418 / BigLittleDecoder
[NeurIPS'23] Speculative Decoding with Big Little Decoder
☆92Updated last year
ptillet / torch-blocksparse
Block-sparse primitives for PyTorch
☆157Updated 4 years ago
facebookresearch / bit
Code repo for the paper BiT Robustly Binarized Multi-distilled Transformer
☆109Updated 2 years ago
lucaslie / torchprune
A research library for pytorch-based neural network pruning, compression, and more.
☆162Updated 2 years ago
kssteven418 / LTP
[KDD'22] Learned Token Pruning for Transformers
☆98Updated 2 years ago
HazyResearch / butterfly
Butterfly matrix multiplication in PyTorch
☆172Updated last year
yxli2123 / LoSparse
☆58Updated last year
BradMcDanel / sdgp
☆10Updated 3 years ago
stanford-futuredata / stk
☆106Updated 10 months ago
IST-DASLab / WoodFisher
Code accompanying the NeurIPS 2020 paper: WoodFisher (Singh & Alistarh, 2020)
☆52Updated 4 years ago
haochengxi / Train_Transformers_with_INT4
☆153Updated 2 years ago
NUS-HPC-AI-Lab / LARS-ImageNet-PyTorch
Accuracy 77%. Large batch deep learning optimizer LARS for ImageNet with PyTorch and ResNet, using Horovod for distribution. Optional acc…
☆38Updated 4 years ago
huggingface / block_movement_pruning
Block Sparse movement pruning
☆81Updated 4 years ago
topal-team / rockmate
☆36Updated 7 months ago
verbiiyo / rigl-torch
Lightweight torch implementation of rigl, a sparse-to-sparse optimizer.
☆57Updated 3 years ago
teelinsan / parallel-decoding
Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"
☆118Updated last year
Qualcomm-AI-research / outlier-free-transformers
☆42Updated last year
VITA-Group / SMC-Bench
[ICLR 2023] "Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!" Shiwei Liu, Tianlong Chen, Zhenyu Zhang, Xuxi Chen…
☆28Updated last year
htqin / BiBERT
This project is the official implementation of our accepted ICLR 2022 paper BiBERT: Accurate Fully Binarized BERT.
☆88Updated 2 years ago
LiuXiaoxuanPKU / GACT-ICML
☆42Updated 2 years ago
insuhan / hyper-attn
☆81Updated last year
shawntan / scattermoe
Triton-based implementation of Sparse Mixture of Experts.
☆224Updated 7 months ago
hwang595 / Cuttlefish
The implementation for MLSys 2023 paper: "Cuttlefish: Low-rank Model Training without All The Tuning"
☆45Updated 2 years ago
lzhangbv / eva
[ICLR 2023] Eva: Practical Second-order Optimization with Kronecker-vectorized Approximation
☆12Updated last year
jundaf2 / INT8-Flash-Attention-FMHA-Quantization
☆157Updated last year
graphcore-research / unit-scaling
A library for unit scaling in PyTorch
☆125Updated 7 months ago
QingruZhang / PLATON
This pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).
☆46Updated 2 years ago