ysngki / UMoELinks
☆21Updated last month
Alternatives and similar repositories for UMoE
Users that are interested in UMoE are comparing it to the libraries listed below
Sorting:
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆17Updated 9 months ago
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆31Updated 7 months ago
- ☆48Updated last year
- ☆19Updated 11 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆56Updated last year
- The official implementation of ICLR 2025 paper "Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models".☆15Updated 7 months ago
- Official code for the paper "Attention as a Hypernetwork"☆46Updated last year
- Learning to Skip the Middle Layers of Transformers☆15Updated 4 months ago
- ☆46Updated last year
- Official implementation of RMoE (Layerwise Recurrent Router for Mixture-of-Experts)☆27Updated last year
- [NeurIPS 2024, spotlight] Multivariate Learned Adaptive Noise for Diffusion Models☆30Updated last year
- ☆16Updated 6 months ago
- [NeurIPS 2024] VeLoRA : Memory Efficient Training using Rank-1 Sub-Token Projections☆21Updated last year
- Official Code for ICLR 2024 Paper: Non-negative Contrastive Learning☆46Updated last year
- [ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…☆27Updated 4 months ago
- Official implementation of ECCV24 paper: POA☆24Updated last year
- ☆34Updated 10 months ago
- Code for "Are “Hierarchical” Visual Representations Hierarchical?" in NeurIPS Workshop for Symmetry and Geometry in Neural Representation…☆21Updated 2 years ago
- Collect papers about Mamba (a selective state space model).☆14Updated last year
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆61Updated last year
- Code for the paper "Cottention: Linear Transformers With Cosine Attention"☆20Updated last month
- This is a simple torch implementation of the high performance Multi-Query Attention☆16Updated 2 years ago
- Do Vision and Language Models Share Concepts? A Vector Space Alignment Study☆16Updated last year
- [NeurIPS'24] Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization☆37Updated last year
- Unofficial Implementation of Selective Attention Transformer☆18Updated last year
- CatMAE☆14Updated 2 years ago
- A benchmark dataset and simple code examples for measuring the perception and reasoning of multi-sensor Vision Language models.☆19Updated 11 months ago
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)☆31Updated 8 months ago
- ☆16Updated 2 years ago
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"☆57Updated last year