YeonwooSung / Pytorch_mixture-of-experts
PyTorch implementation of moe, which stands for mixture of experts
☆36Updated 3 years ago
Alternatives and similar repositories for Pytorch_mixture-of-experts:
Users that are interested in Pytorch_mixture-of-experts are comparing it to the libraries listed below
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆96Updated 4 months ago
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆69Updated last year
- Implementation of Infini-Transformer in Pytorch☆109Updated 3 weeks ago
- [Under Review] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with enla…☆48Updated 3 months ago
- Implementation of the general framework for AMIE, from the paper "Towards Conversational Diagnostic AI", out of Google Deepmind☆55Updated 4 months ago
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆79Updated this week
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆24Updated this week
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆113Updated 3 months ago
- ☆37Updated last year
- Training code for Baby-Llama, our submission to the strict-small track of the BabyLM challenge.☆76Updated last year
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆80Updated 2 weeks ago
- A Closer Look into Mixture-of-Experts in Large Language Models☆41Updated 5 months ago
- Implementation of TableFormer, Robust Transformer Modeling for Table-Text Encoding, in Pytorch☆37Updated 2 years ago
- Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch☆257Updated 9 months ago
- Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"☆123Updated 9 months ago
- Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind☆173Updated 4 months ago
- MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria☆61Updated 3 months ago
- Code for NeurIPS LLM Efficiency Challenge☆54Updated 9 months ago
- ☆160Updated 11 months ago
- ☆57Updated this week
- ☆41Updated 2 weeks ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆149Updated last month
- Model Stock: All we need is just a few fine-tuned models☆100Updated 4 months ago
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆76Updated 4 months ago
- Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters☆86Updated last year
- [ACL 2024 Findings & ICLR 2024 WS] An Evaluator VLM that is open-source, offers reproducible evaluation, and inexpensive to use. Specific…☆62Updated 4 months ago
- Video descriptions of research papers relating to foundation models and scaling☆30Updated last year
- PyTorch implementation of "From Sparse to Soft Mixtures of Experts"☆50Updated last year
- Touchstone: Evaluating Vision-Language Models by Language Models☆81Updated last year