microsoft / TextNAS
This is the implementation of the TextNAS algorithm proposed in the paper TextNAS: A Neural Architecture Search Space tailored for Text Representation.
☆15Updated last year
Related projects: ⓘ
- Factorized Neural Layers☆27Updated last year
- This package implements THOR: Transformer with Stochastic Experts.☆60Updated 2 years ago
- [NeurIPS 2022] DataMUX: Data Multiplexing for Neural Networks☆58Updated last year
- We view Large Language Models as stochastic language layers in a network, where the learnable parameters are the natural language prompts…☆91Updated last month
- some common Huggingface transformers in maximal update parametrization (µP)☆76Updated 2 years ago
- Block Sparse movement pruning☆77Updated 3 years ago
- RL algorithm: Advantage induced policy alignment☆62Updated last year
- ☆187Updated last year
- ☆234Updated last month
- [NeurIPS'23] Speculative Decoding with Big Little Decoder☆84Updated 7 months ago
- [NeurIPS 2020] "The Lottery Ticket Hypothesis for Pre-trained BERT Networks", Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Ya…☆137Updated 2 years ago
- Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method (NeurIPS 2021)☆58Updated 2 years ago
- Official Pytorch Implementation of Length-Adaptive Transformer (ACL 2021)☆100Updated 3 years ago
- Implementation of a Transformer, but completely in Triton☆242Updated 2 years ago
- Building modular LMs with parameter-efficient fine-tuning.☆73Updated this week
- Code for "Training Neural Networks with Fixed Sparse Masks" (NeurIPS 2021).☆54Updated 2 years ago
- Research and development for optimizing transformers☆121Updated 3 years ago
- [JMLR'20] NeurIPS 2019 MicroNet Challenge Efficient Language Modeling, Champion☆40Updated 3 years ago
- OSLO: Open Source framework for Large-scale model Optimization☆306Updated 2 years ago
- Official code for "Too Brittle To Touch: Comparing the Stability of Quantization and Distillation Towards Developing Lightweight Low-Reso…☆18Updated 11 months ago
- MLPruning, PyTorch, NLP, BERT, Structured Pruning☆21Updated 3 years ago
- Official code for "Distributed Deep Learning in Open Collaborations" (NeurIPS 2021)☆115Updated 2 years ago
- Method to improve inference time for BERT. This is an implementation of the paper titled "PoWER-BERT: Accelerating BERT Inference via Pro…☆58Updated last year
- NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference☆58Updated this week
- This is the implementation of the paper AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning (https://arxiv.org/abs/2205.1…☆124Updated last year
- Implémentation of the article **Deep Learning CUDA Memory Usage and Pytorch optimization tricks**☆42Updated 4 years ago
- Fault-aware neural code rankers☆23Updated last year
- Learning Accurate Decision Trees with Bandit Feedback via Quantized Gradient Descent☆14Updated 2 years ago
- Pytorch library for factorized L0-based pruning.☆42Updated 11 months ago
- [ICLR 2023] "Learning to Grow Pretrained Models for Efficient Transformer Training" by Peihao Wang, Rameswar Panda, Lucas Torroba Hennige…☆81Updated 6 months ago
- ☆12Updated 2 years ago