☆223Feb 21, 2023Updated 3 years ago
Alternatives and similar repositories for fly
Users that are interested in fly are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Butterfly matrix multiplication in PyTorch☆179Oct 5, 2023Updated 2 years ago
- ☆32Jan 7, 2024Updated 2 years ago
- Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing☆49Jan 27, 2022Updated 4 years ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆35Jun 12, 2024Updated last year
- The implementation for MLSys 2023 paper: "Cuttlefish: Low-rank Model Training without All The Tuning"☆44May 10, 2023Updated 2 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Code publication to the paper "Normalized Attention Without Probability Cage"☆17Nov 9, 2021Updated 4 years ago
- train with kittens!☆64Oct 25, 2024Updated last year
- [NeurIPS 2023] Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning☆33Jun 2, 2023Updated 2 years ago
- Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"☆561Dec 28, 2024Updated last year
- Reproducing RigL (ICML 2020) as a part of ML Reproducibility Challenge 2020☆29Jan 6, 2022Updated 4 years ago
- Code for testing DCT plus Sparse (DCTpS) networks☆14Jun 15, 2021Updated 4 years ago
- Parameter Efficient Transfer Learning with Diff Pruning☆74Feb 3, 2021Updated 5 years ago
- A collection of research papers on efficient training of DNNs☆69Jul 6, 2022Updated 3 years ago
- AN EFFICIENT AND GENERAL FRAMEWORK FOR LAYERWISE-ADAPTIVE GRADIENT COMPRESSION☆14Oct 27, 2023Updated 2 years ago
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- ☆15Apr 11, 2024Updated 2 years ago
- [ICLR 2023] Eva: Practical Second-order Optimization with Kronecker-vectorized Approximation☆12Jul 31, 2023Updated 2 years ago
- ☆12Sep 26, 2019Updated 6 years ago
- ☆33Apr 12, 2021Updated 4 years ago
- ☆19Jun 3, 2023Updated 2 years ago
- End-to-end training of sparse deep neural networks with little-to-no performance loss.☆335Jan 26, 2023Updated 3 years ago
- [ECCV 2022] SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning☆20Jul 7, 2022Updated 3 years ago
- ☆11Oct 11, 2023Updated 2 years ago
- Blog post☆17Feb 16, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- ☆40Jan 5, 2024Updated 2 years ago
- ☆20May 30, 2024Updated last year
- Structured Pruning Adapters in PyTorch☆19Aug 30, 2023Updated 2 years ago
- Curse-of-memory phenomenon of RNNs in sequence modelling☆19May 8, 2025Updated 11 months ago
- ☆31Jul 2, 2023Updated 2 years ago
- Block-sparse primitives for PyTorch☆158Apr 5, 2021Updated 5 years ago
- Simple and efficient pytorch-native transformer training and inference (batched)☆78Apr 2, 2024Updated 2 years ago
- ☆63Oct 3, 2024Updated last year
- [ICML 2021] "Do We Actually Need Dense Over-Parameterization? In-Time Over-Parameterization in Sparse Training" by Shiwei Liu, Lu Yin, De…☆45Nov 11, 2023Updated 2 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- BlockCIrculantRNN (LSTM and GRU) using TensorFlow☆14Oct 30, 2018Updated 7 years ago
- Sparsity support for PyTorch☆38Mar 22, 2025Updated last year
- [ACL-IJCNLP 2021] "EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets" by Xiaohan Chen, Yu Cheng, Shuohang Wang, Zhe Gan, …☆18Dec 30, 2021Updated 4 years ago
- Codebase for the paper "A Gradient Flow Framework for Analyzing Network Pruning"☆20Jan 31, 2021Updated 5 years ago
- Implementation for ACProp ( Momentum centering and asynchronous update for adaptive gradient methdos, NeurIPS 2021)☆16Oct 11, 2021Updated 4 years ago
- [COLM'25] A Controlled Study on Long Context Extension and Generalization in LLMs☆64Mar 9, 2026Updated last month
- ☆26Nov 23, 2023Updated 2 years ago