jaisidhsingh / pytorch-mixturesLinks
One-stop solutions for Mixture of Experts and Mixture of Depth modules in PyTorch.
☆25Updated 6 months ago
Alternatives and similar repositories for pytorch-mixtures
Users that are interested in pytorch-mixtures are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] CAMEx: Curvature-Aware Merging of Experts☆22Updated 9 months ago
- This is a PyTorch implementation of the paperViP A Differentially Private Foundation Model for Computer Vision☆36Updated 2 years ago
- [ICLR 2024 Oral] Improving Convergence and Generalization Using Parameter Symmetries☆30Updated last year
- ☆36Updated 8 months ago
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆120Updated last year
- Implementation of a multimodal diffusion transformer in Pytorch☆107Updated last year
- Pytorch Implementation of the sparse attention from the paper: "Generating Long Sequences with Sparse Transformers"☆92Updated last month
- Implementation of Infini-Transformer in Pytorch☆113Updated 11 months ago
- Video descriptions of research papers relating to foundation models and scaling☆30Updated 2 years ago
- MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248☆56Updated last year
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆31Updated last year
- Repository for the paper: "TiC-CLIP: Continual Training of CLIP Models" ICLR 2024☆109Updated last year
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆132Updated last month
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆78Updated 2 years ago
- A regression-alike loss to improve numerical reasoning in language models - ICML 2025☆27Updated 3 months ago
- The official repo of continuous speculative decoding☆30Updated 8 months ago
- Sparse Autoencoders for Stable Diffusion XL models.☆79Updated last month
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆60Updated 11 months ago
- A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.☆97Updated 11 months ago
- Efficiently discovering algorithms via LLMs with evolutionary search and reinforcement learning.☆118Updated 2 weeks ago
- This repository is the implementation of the paper Training Free Pretrained Model Merging (CVPR2024).☆32Updated last year
- PyTorch implementation of "From Sparse to Soft Mixtures of Experts"☆66Updated 2 years ago
- MobileLLM-R1☆68Updated 2 months ago
- Contrastive Reinforcement Learning☆49Updated last week
- code for the ddp tutorial☆32Updated 3 years ago
- An official PyTorch implementation for CLIPPR☆29Updated 2 years ago
- Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch☆337Updated 8 months ago
- Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"☆101Updated last year
- [Technical Report] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with …☆61Updated last year
- ☆48Updated last month