rlin27 / DeBut
Codes of the paper Deformable Butterfly: A Highly Structured and Sparse Linear Transform.
☆12Updated 3 years ago
Alternatives and similar repositories for DeBut:
Users that are interested in DeBut are comparing it to the libraries listed below
- ☆19Updated last year
- Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012☆49Updated 2 years ago
- DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training (ICLR 2023)☆30Updated last year
- ☆33Updated last year
- ☆14Updated 2 years ago
- Official code for the paper "Attention as a Hypernetwork"☆25Updated 8 months ago
- ☆21Updated 2 years ago
- Awesome Triton Resources☆20Updated 3 months ago
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Updated 2 years ago
- Xmixers: A collection of SOTA efficient token/channel mixers☆11Updated 4 months ago
- The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …☆16Updated last year
- [ICML 2023] "Data Efficient Neural Scaling Law via Model Reusing" by Peihao Wang, Rameswar Panda, Zhangyang Wang☆14Updated last year
- [ECCV 2022] SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning☆19Updated 2 years ago
- ☆15Updated last year
- Implementation of Hyena Hierarchy in JAX☆10Updated last year
- Official implementation of the paper: "A deeper look at depth pruning of LLMs"☆14Updated 7 months ago
- Code repository for the public reproduction of the language modelling experiments on "MatFormer: Nested Transformer for Elastic Inference…☆19Updated last year
- Scaling Sparse Fine-Tuning to Large Language Models☆16Updated last year
- ☆13Updated 2 weeks ago
- ☆29Updated 2 years ago
- [Oral; Neurips OPT2024 ] μLO: Compute-Efficient Meta-Generalization of Learned Optimizers☆12Updated 3 months ago
- [ICLR 2025] Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better☆14Updated 3 weeks ago
- ☆18Updated 9 months ago
- [NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling☆36Updated last year
- ☆30Updated 9 months ago
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆17Updated 11 months ago
- Code for paper: "LASeR: Learning to Adaptively Select Reward Models with Multi-Arm Bandits"☆13Updated 5 months ago
- Official Pytorch Implementation of Unsupervised Representation Learning for Binary Networks by Joint Classifier Training (CVPR 2022)☆10Updated 2 years ago