lucidrains / mixture-of-experts
A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models
☆655Updated last year
Alternatives and similar repositories for mixture-of-experts:
Users that are interested in mixture-of-experts are comparing it to the libraries listed below
- PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538☆1,009Updated 7 months ago
- Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch☆294Updated 6 months ago
- Tutel MoE: An Optimized Mixture-of-Experts Implementation☆740Updated 3 weeks ago
- ☆584Updated last week
- A collection of AWESOME things about mixture-of-experts☆998Updated last week
- A curated reading list of research in Mixture-of-Experts(MoE).☆546Updated last month
- Code for the ALiBi method for transformer language models (ICLR 2022)☆507Updated last year
- Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch☆595Updated 2 weeks ago
- A fast MoE impl for PyTorch☆1,579Updated 5 months ago
- Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"☆353Updated last year
- Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch☆248Updated 7 months ago
- Transformer based on a variant of attention that is linear complexity in respect to sequence length☆707Updated 7 months ago
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆484Updated last month
- Long Range Arena for Benchmarking Efficient Transformers☆735Updated last year
- An implementation of local windowed attention for language modeling☆393Updated 3 months ago
- Rotary Transformer☆836Updated 2 years ago
- (Unofficial) PyTorch implementation of grouped-query attention (GQA) from "GQA: Training Generalized Multi-Query Transformer Models from …☆137Updated 7 months ago
- Implementation of paper "Towards a Unified View of Parameter-Efficient Transfer Learning" (ICLR 2022)☆517Updated 2 years ago
- Large Context Attention☆654Updated 4 months ago
- Helpful tools and examples for working with flex-attention☆531Updated this week
- TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.☆1,497Updated last week
- Implementation of Linformer for Pytorch☆257Updated 11 months ago
- Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time☆434Updated 5 months ago
- Collection of papers on state-space models☆557Updated last month
- Diffusion-LM☆1,063Updated 4 months ago
- Implementation of Recurrent Memory Transformer, Neurips 2022 paper, in Pytorch☆395Updated 3 weeks ago
- Sequence modeling with Mega.☆298Updated last year
- Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton☆1,420Updated this week
- Official implementation of TransNormerLLM: A Faster and Better LLM☆231Updated 10 months ago
- Pytorch library for fast transformer implementations☆1,654Updated last year