☆223Feb 21, 2023Updated 3 years ago
Alternatives and similar repositories for fly
Users that are interested in fly are comparing it to the libraries listed below
Sorting:
- Butterfly matrix multiplication in PyTorch☆179Oct 5, 2023Updated 2 years ago
- ☆32Jan 7, 2024Updated 2 years ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆35Jun 12, 2024Updated last year
- The implementation for MLSys 2023 paper: "Cuttlefish: Low-rank Model Training without All The Tuning"☆45May 10, 2023Updated 2 years ago
- Code publication to the paper "Normalized Attention Without Probability Cage"☆17Nov 9, 2021Updated 4 years ago
- train with kittens!☆64Oct 25, 2024Updated last year
- Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"☆563Dec 28, 2024Updated last year
- Reproducing RigL (ICML 2020) as a part of ML Reproducibility Challenge 2020☆29Jan 6, 2022Updated 4 years ago
- Code for testing DCT plus Sparse (DCTpS) networks☆14Jun 15, 2021Updated 4 years ago
- Parameter Efficient Transfer Learning with Diff Pruning☆74Feb 3, 2021Updated 5 years ago
- Code for the ICML 2021 and ICLR 2022 papers: Skew Orthogonal Convolutions, Improved deterministic l2 robustness on CIFAR-10 and CIFAR-100☆18Feb 20, 2022Updated 4 years ago
- AN EFFICIENT AND GENERAL FRAMEWORK FOR LAYERWISE-ADAPTIVE GRADIENT COMPRESSION☆14Oct 27, 2023Updated 2 years ago
- ☆15Apr 11, 2024Updated last year
- [ICLR 2023] Eva: Practical Second-order Optimization with Kronecker-vectorized Approximation☆12Jul 31, 2023Updated 2 years ago
- ☆12Sep 26, 2019Updated 6 years ago
- ☆33Apr 12, 2021Updated 4 years ago
- ☆20Jun 3, 2023Updated 2 years ago
- End-to-end training of sparse deep neural networks with little-to-no performance loss.☆335Jan 26, 2023Updated 3 years ago
- [ECCV 2022] SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning☆20Jul 7, 2022Updated 3 years ago
- Blog post☆17Feb 16, 2024Updated 2 years ago
- ☆11Oct 11, 2023Updated 2 years ago
- ☆40Jan 5, 2024Updated 2 years ago
- ☆20May 30, 2024Updated last year
- Staged Training for Transformer Language Models☆33Mar 31, 2022Updated 3 years ago
- Structured Pruning Adapters in PyTorch☆19Aug 30, 2023Updated 2 years ago
- Parallel Associative Scan for Language Models☆18Jan 8, 2024Updated 2 years ago
- Curse-of-memory phenomenon of RNNs in sequence modelling☆19May 8, 2025Updated 10 months ago
- ☆31Jul 2, 2023Updated 2 years ago
- Block-sparse primitives for PyTorch☆158Apr 5, 2021Updated 4 years ago
- Simple and efficient pytorch-native transformer training and inference (batched)☆79Apr 2, 2024Updated last year
- ☆63Oct 3, 2024Updated last year
- [ICML 2021] "Do We Actually Need Dense Over-Parameterization? In-Time Over-Parameterization in Sparse Training" by Shiwei Liu, Lu Yin, De…☆45Nov 11, 2023Updated 2 years ago
- BlockCIrculantRNN (LSTM and GRU) using TensorFlow☆14Oct 30, 2018Updated 7 years ago
- [ACL-IJCNLP 2021] "EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets" by Xiaohan Chen, Yu Cheng, Shuohang Wang, Zhe Gan, …☆18Dec 30, 2021Updated 4 years ago
- Sparsity support for PyTorch☆38Mar 22, 2025Updated last year
- Codebase for the paper "A Gradient Flow Framework for Analyzing Network Pruning"☆20Jan 31, 2021Updated 5 years ago
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration☆3,469Jul 17, 2025Updated 8 months ago
- Implementation for ACProp ( Momentum centering and asynchronous update for adaptive gradient methdos, NeurIPS 2021)☆16Oct 11, 2021Updated 4 years ago
- [COLM'25] A Controlled Study on Long Context Extension and Generalization in LLMs☆64Mar 9, 2026Updated 2 weeks ago