jaisidhsingh / pytorch-mixturesLinks
One-stop solutions for Mixture of Experts and Mixture of Depth modules in PyTorch.
☆26Updated 8 months ago
Alternatives and similar repositories for pytorch-mixtures
Users that are interested in pytorch-mixtures are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] CAMEx: Curvature-Aware Merging of Experts☆22Updated 11 months ago
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆61Updated last year
- Repository for the paper: "TiC-CLIP: Continual Training of CLIP Models" ICLR 2024☆110Updated last year
- Sparse Autoencoders for Stable Diffusion XL models.☆79Updated 3 months ago
- MobileLLM-R1☆75Updated 4 months ago
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆31Updated last year
- This is a PyTorch implementation of the paperViP A Differentially Private Foundation Model for Computer Vision☆36Updated 2 years ago
- Unofficial Implementation of Selective Attention Transformer☆20Updated last year
- Model Merging with SVD to Tie the KnOTS [ICLR 2025]☆85Updated 9 months ago
- This repository is the implementation of the paper Training Free Pretrained Model Merging (CVPR2024).☆32Updated last year
- Autoregressive Image Generation☆31Updated 7 months ago
- Implementation of Infini-Transformer in Pytorch☆112Updated last year
- We study toy models of skill learning.☆31Updated last year
- Implementation of a multimodal diffusion transformer in Pytorch☆107Updated last year
- ☆36Updated 10 months ago
- ☆23Updated last year
- ☆32Updated last year
- [ICML 24] A novel automated neuron explanation framework that can accurately describe poly-semantic concepts in deep neural networks☆14Updated 8 months ago
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆82Updated 2 years ago
- Official code for the ICML 2024 paper "The Entropy Enigma: Success and Failure of Entropy Minimization"☆55Updated last year
- [ICLR 2024 Oral] Improving Convergence and Generalization Using Parameter Symmetries☆31Updated last year
- ☆52Updated last month
- ☆191Updated last year
- source code for paper "Riemannian Preconditioned LoRA for Fine-Tuning Foundation Models"☆33Updated last year
- Contrastive Reinforcement Learning☆59Updated 2 weeks ago
- [TMLR 25] An automated method for explaining complex neuron behaviors in deep vision models using large language models☆10Updated 11 months ago
- Experiments for "A Closer Look at In-Context Learning under Distribution Shifts"☆19Updated 2 years ago
- Official PyTorch implementation of DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs (ICML 2025 Oral)☆55Updated 7 months ago
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆122Updated last year
- Official implementation for Equivariant Architectures for Learning in Deep Weight Spaces [ICML 2023]☆90Updated 2 years ago