google-research / vmoeLinks

☆683

Alternatives and similar repositories for vmoe

Users that are interested in vmoe are comparing it to the libraries listed below

Sorting:

microsoft / Tutel
Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4
☆934Updated 3 weeks ago
lucidrains / mixture-of-experts
A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models
☆821Updated 2 years ago
mlfoundations / model-soups
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
☆488Updated last year
lucidrains / st-moe-pytorch
Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch
☆366Updated last year
facebookresearch / ToMe
A method to increase the speed and lower the memory footprint of existing vision transformers.
☆1,110Updated last year
lucidrains / soft-moe-pytorch
Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch
☆331Updated 6 months ago
Arnav0400 / ViT-Slim
Official code for our CVPR'22 paper “Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space”
☆250Updated 2 months ago
codecaution / Awesome-Mixture-of-Experts-Papers
A curated reading list of research in Mixture-of-Experts(MoE).
☆648Updated 11 months ago
bytedance / ibot
iBOT : Image BERT Pre-Training with Online Tokenizer (ICLR 2022)
☆748Updated 3 years ago
davidmrau / mixture-of-experts
PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538
☆1,188Updated last year
LAION-AI / CLIP_benchmark
CLIP-like model evaluation
☆780Updated 2 months ago
sail-sg / Adan
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
☆802Updated 4 months ago
mlfoundations / datacomp
DataComp: In search of the next generation of multimodal datasets
☆745Updated 5 months ago
NUS-HPC-AI-Lab / InfoBatch
Lossless Training Speed Up by Unbiased Dynamic Data Pruning
☆341Updated last year
facebookresearch / flip
Official Open Source code for "Scaling Language-Image Pre-training via Masking"
☆428Updated 2 years ago
mlfoundations / wise-ft
Robust fine-tuning of zero-shot models
☆744Updated 3 years ago
laekov / fastmoe
A fast MoE impl for PyTorch
☆1,806Updated 8 months ago
facebookresearch / multimodal
TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.
☆1,656Updated last week
Sense-GVT / DeCLIP
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
☆667Updated 3 years ago
Alibaba-MIIL / ImageNet21K
Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(NeurIPS, 2021) paper
☆774Updated 2 years ago
zhijian-liu / torchprofile
A general and accurate MACs / FLOPs profiler for PyTorch models
☆629Updated 2 months ago
google-research / pix2seq
Pix2Seq codebase: multi-tasks with generative modeling (autoregressive and diffusion)
☆930Updated last year
raoyongming / DynamicViT
[NeurIPS 2021] [T-PAMI] DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification
☆634Updated 2 years ago
lucidrains / memory-efficient-attention-pytorch
Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"
☆383Updated 2 years ago
lucidrains / flamingo-pytorch
Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch
☆1,266Updated 3 years ago
jianghaojun / Awesome-Parameter-Efficient-Transfer-Learning
A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.
☆409Updated last year
ziplab / SN-Net
[CVPR 2023 Highlight] This is the official implementation of "Stitchable Neural Networks".
☆249Updated 2 years ago
lucidrains / linear-attention-transformer
Transformer based on a variant of attention that is linear complexity in respect to sequence length
☆801Updated last year
kyegomez / AttentionIsOFFByOne
Implementation of "Attention Is Off By One" by Evan Miller
☆196Updated 2 years ago
XueFuzhao / awesome-mixture-of-experts
A collection of AWESOME things about mixture-of-experts
☆1,217Updated 10 months ago