google-research / vmoe
☆575Updated last week
Related projects ⓘ
Alternatives and complementary repositories for vmoe
- A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models☆642Updated last year
- Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch☆293Updated 5 months ago
- Tutel MoE: An Optimized Mixture-of-Experts Implementation☆736Updated this week
- A method to increase the speed and lower the memory footprint of existing vision transformers.☆971Updated 5 months ago
- Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch☆248Updated 6 months ago
- Official code for our CVPR'22 paper “Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space”☆246Updated last year
- Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time☆429Updated 4 months ago
- DataComp: In search of the next generation of multimodal datasets☆658Updated 10 months ago
- Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models☆761Updated 4 months ago
- PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538☆984Updated 7 months ago
- Official Open Source code for "Scaling Language-Image Pre-training via Masking"☆407Updated last year
- iBOT : Image BERT Pre-Training with Online Tokenizer (ICLR 2022)☆680Updated 2 years ago
- Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm☆637Updated 2 years ago
- A curated reading list of research in Mixture-of-Experts(MoE).☆541Updated 3 weeks ago
- Implementation of "Attention Is Off By One" by Evan Miller☆184Updated last year
- (Unofficial) PyTorch implementation of grouped-query attention (GQA) from "GQA: Training Generalized Multi-Query Transformer Models from …☆133Updated 6 months ago
- CLIP-like model evaluation☆618Updated 3 months ago
- A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.☆391Updated last month
- ☆267Updated last year
- A collection of AWESOME things about mixture-of-experts☆974Updated 3 months ago
- This repository contains the implementation for the paper "EMP-SSL: Towards Self-Supervised Learning in One Training Epoch."☆221Updated last year
- Offsite-Tuning: Transfer Learning without Full Model☆368Updated 11 months ago
- Rotary Transformer☆826Updated 2 years ago
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆477Updated 3 weeks ago
- PyTorch codes for "LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning"☆232Updated last year
- Microsoft Automatic Mixed Precision Library☆525Updated last month
- Official implementation of TransNormerLLM: A Faster and Better LLM☆229Updated 10 months ago
- Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"☆362Updated last year
- Code release for "Dropout Reduces Underfitting"☆312Updated last year
- Implementation of paper "Towards a Unified View of Parameter-Efficient Transfer Learning" (ICLR 2022)☆516Updated 2 years ago