jaisidhsingh / pytorch-mixturesLinks
One-stop solutions for Mixture of Experts and Mixture of Depth modules in PyTorch.
☆24Updated 3 months ago
Alternatives and similar repositories for pytorch-mixtures
Users that are interested in pytorch-mixtures are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] CAMEx: Curvature-Aware Merging of Experts☆22Updated 6 months ago
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆58Updated 9 months ago
- This is a PyTorch implementation of the paperViP A Differentially Private Foundation Model for Computer Vision☆36Updated 2 years ago
- Implementation of a multimodal diffusion transformer in Pytorch☆104Updated last year
- The official repo of continuous speculative decoding☆29Updated 5 months ago
- Official implementation for Equivariant Architectures for Learning in Deep Weight Spaces [ICML 2023]☆89Updated 2 years ago
- Sparse Autoencoders for Stable Diffusion XL models.☆69Updated last month
- Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"☆101Updated last year
- Video descriptions of research papers relating to foundation models and scaling☆31Updated 2 years ago
- Pytorch Implementation of the sparse attention from the paper: "Generating Long Sequences with Sparse Transformers"☆86Updated last week
- ☆35Updated 6 months ago
- Contrastive Reinforcement Learning☆44Updated 2 weeks ago
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆121Updated 11 months ago
- Autoregressive Image Generation☆32Updated 3 months ago
- Repository for the paper: "TiC-CLIP: Continual Training of CLIP Models" ICLR 2024☆104Updated last year
- ☆52Updated 8 months ago
- [ICLR 2024 Oral] Improving Convergence and Generalization Using Parameter Symmetries☆29Updated last year
- NeuMeta transforms neural networks by allowing a single model to adapt on the fly to different sizes, generating the right weights when n…☆43Updated 10 months ago
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆78Updated last year
- Model Merging with SVD to Tie the KnOTS [ICLR 2025]☆66Updated 5 months ago
- Implementation of the proposed DeepCrossAttention by Heddes et al at Google research, in Pytorch☆93Updated 6 months ago
- A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.☆96Updated 9 months ago
- ☆12Updated 3 years ago
- A curated list of Model Merging methods.☆92Updated last year
- Official code for the ICML 2024 paper "The Entropy Enigma: Success and Failure of Entropy Minimization"☆53Updated last year
- Implementation of Agent Attention in Pytorch☆91Updated last year
- This repository is the implementation of the paper Training Free Pretrained Model Merging (CVPR2024).☆31Updated last year
- Official repo for Detecting, Explaining, and Mitigating Memorization in Diffusion Models (ICLR 2024)☆76Updated last year
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆128Updated last year
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"☆56Updated last year