YeonwooSung / Pytorch_mixture-of-experts
PyTorch implementation of moe, which stands for mixture of experts
☆38Updated 4 years ago
Alternatives and similar repositories for Pytorch_mixture-of-experts:
Users that are interested in Pytorch_mixture-of-experts are comparing it to the libraries listed below
- Implementation of Infini-Transformer in Pytorch☆109Updated last month
- Implementation of the general framework for AMIE, from the paper "Towards Conversational Diagnostic AI", out of Google Deepmind☆56Updated 5 months ago
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆71Updated last year
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆96Updated 4 months ago
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆114Updated 3 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆149Updated last month
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models☆43Updated last week
- A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.☆89Updated last month
- ☆41Updated 3 weeks ago
- Code for NeurIPS LLM Efficiency Challenge☆55Updated 10 months ago
- Model Stock: All we need is just a few fine-tuned models☆102Updated 4 months ago
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆36Updated 4 months ago
- Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context☆25Updated 6 months ago
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆24Updated this week
- ☆37Updated last year
- MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248☆36Updated 7 months ago
- Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012☆49Updated 2 years ago
- Implementation of TableFormer, Robust Transformer Modeling for Table-Text Encoding, in Pytorch☆37Updated 2 years ago
- Repository for the paper: 500xCompressor: Generalized Prompt Compression for Large Language Models☆24Updated 6 months ago
- ☆47Updated 5 months ago
- Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind☆173Updated 5 months ago
- ☆47Updated last year
- Video descriptions of research papers relating to foundation models and scaling☆30Updated last year
- ☆60Updated last week
- Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch☆260Updated 9 months ago
- Efficient Infinite Context Transformers with Infini-attention Pytorch Implementation + QwenMoE Implementation + Training Script + 1M cont…☆76Updated 9 months ago
- M4 experiment logbook☆56Updated last year
- Collection of autoregressive model implementation☆81Updated this week
- Implementation of MaMMUT, a simple vision-encoder text-decoder architecture for multimodal tasks from Google, in Pytorch☆99Updated last year