ysngki / UMoELinks
☆21Updated 2 months ago
Alternatives and similar repositories for UMoE
Users that are interested in UMoE are comparing it to the libraries listed below
Sorting:
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆17Updated 9 months ago
- Unofficial Implementation of Selective Attention Transformer☆20Updated last year
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆31Updated 8 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆56Updated last year
- ☆19Updated 11 months ago
- Learning to Skip the Middle Layers of Transformers☆16Updated 5 months ago
- ☆48Updated last year
- ☆46Updated last year
- Official Code for ICLR 2024 Paper: Non-negative Contrastive Learning☆46Updated last year
- Official code for the ICML 2024 paper "The Entropy Enigma: Success and Failure of Entropy Minimization"☆55Updated last year
- Unofficial implementation of paper : Exploring the Space of Key-Value-Query Models with Intention☆12Updated 2 years ago
- ☆34Updated 10 months ago
- ☆19Updated 9 months ago
- Official code for the paper "Attention as a Hypernetwork"☆46Updated last year
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆61Updated last year
- ☆17Updated 6 months ago
- [ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…☆27Updated 5 months ago
- [NeurIPS'24] Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization☆38Updated last year
- Collect papers about Mamba (a selective state space model).☆14Updated last year
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)☆32Updated 8 months ago
- [NeurIPS 2024] VeLoRA : Memory Efficient Training using Rank-1 Sub-Token Projections☆21Updated last year
- ☆20Updated 2 months ago
- ☆40Updated last year
- [NeurIPS '25] Multi-Token Prediction Needs Registers☆26Updated 3 weeks ago
- Code for "Are “Hierarchical” Visual Representations Hierarchical?" in NeurIPS Workshop for Symmetry and Geometry in Neural Representation…☆21Updated 2 years ago
- This is an implementation of the paper "Are We Done with Object-Centric Learning?"☆12Updated 3 months ago
- Code for "Theoretical Foundations of Deep Selective State-Space Models" (NeurIPS 2024)☆15Updated last year
- ☆50Updated 11 months ago
- CatMAE☆14Updated 2 years ago
- [NeurIPS 2024, spotlight] Multivariate Learned Adaptive Noise for Diffusion Models☆30Updated last year