HazyResearch / fly
☆194Updated last year
Related projects ⓘ
Alternatives and complementary repositories for fly
- Code repo for the paper BiT Robustly Binarized Multi-distilled Transformer☆100Updated last year
- Code for the NeurIPS 2022 paper "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning".☆104Updated last year
- ☆132Updated last year
- [NeurIPS 2022] A Fast Post-Training Pruning Framework for Transformers☆171Updated last year
- Soft Threshold Weight Reparameterization for Learnable Sparsity☆88Updated last year
- Block-sparse primitives for PyTorch☆148Updated 3 years ago
- This pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).☆41Updated 2 years ago
- [NeurIPS'23] Speculative Decoding with Big Little Decoder☆86Updated 9 months ago
- Code for "Training Neural Networks with Fixed Sparse Masks" (NeurIPS 2021).☆56Updated 2 years ago
- Block Sparse movement pruning☆78Updated 3 years ago
- A research library for pytorch-based neural network pruning, compression, and more.☆160Updated last year
- ☆214Updated 2 years ago
- This project is the official implementation of our accepted ICLR 2022 paper BiBERT: Accurate Fully Binarized BERT.☆84Updated last year
- [KDD'22] Learned Token Pruning for Transformers☆93Updated last year
- Triton-based implementation of Sparse Mixture of Experts.☆185Updated last month
- Butterfly matrix multiplication in PyTorch☆164Updated last year
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)☆79Updated last year
- ☆207Updated 6 months ago
- Parameter Efficient Transfer Learning with Diff Pruning☆72Updated 3 years ago
- This repository contains integer operators on GPUs for PyTorch.☆183Updated last year
- ☆11Updated 2 years ago
- ☆156Updated last year
- ☆195Updated 3 years ago
- Implementation of Continuous Sparsification, a method for pruning and ticket search in deep networks☆32Updated 2 years ago
- Fast Hadamard transform in CUDA, with a PyTorch interface☆111Updated 5 months ago
- Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"☆51Updated 4 months ago
- [ICML'21 Oral] I-BERT: Integer-only BERT Quantization☆229Updated last year
- [ICLR 2022] "Unified Vision Transformer Compression" by Shixing Yu*, Tianlong Chen*, Jiayi Shen, Huan Yuan, Jianchao Tan, Sen Yang, Ji Li…☆48Updated 11 months ago
- Code for the paper: Why Transformers Need Adam: A Hessian Perspective☆42Updated 6 months ago
- Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"☆109Updated 8 months ago