microsoft / AutoMoE
AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers
☆46Updated 2 years ago
Alternatives and similar repositories for AutoMoE:
Users that are interested in AutoMoE are comparing it to the libraries listed below
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆51Updated 2 years ago
- ☆53Updated 10 months ago
- Code for "Merging Text Transformers from Different Initializations"☆20Updated 3 months ago
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆48Updated last year
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆39Updated last year
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models☆51Updated 3 months ago
- ☆45Updated 2 months ago
- This repository contains papers for a comprehensive survey on accelerated generation techniques in Large Language Models (LLMs).☆11Updated 11 months ago
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)☆80Updated last year
- This package implements THOR: Transformer with Stochastic Experts.☆61Updated 3 years ago
- Repo for ACL2023 Findings paper "Emergent Modularity in Pre-trained Transformers"☆23Updated last year
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆40Updated last year
- Official implementation of "The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs"☆26Updated 2 weeks ago
- [NeurIPS 2022] DreamShard: Generalizable Embedding Table Placement for Recommender Systems☆29Updated 2 years ago
- ☆25Updated last year
- Repository for CPU Kernel Generation for LLM Inference☆26Updated last year
- Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"☆28Updated last year
- Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719☆22Updated 11 months ago
- Linear Attention Sequence Parallelism (LASP)