qiuzh20 / RMoE
Official implementation of RMoE (Layerwise Recurrent Router for Mixture-of-Experts)
☆17Updated 7 months ago
Alternatives and similar repositories for RMoE:
Users that are interested in RMoE are comparing it to the libraries listed below
- HGRN2: Gated Linear RNNs with State Expansion☆53Updated 6 months ago
- Official implementation of ECCV24 paper: POA☆24Updated 6 months ago
- Project for SNARE benchmark☆10Updated 8 months ago
- ☆22Updated 5 months ago
- Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxiang Li, Lu Yi…☆17Updated 2 months ago
- ☆17Updated last month
- Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch☆29Updated 3 weeks ago
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.☆34Updated 8 months ago
- arXiv 23 "Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs"☆14Updated 3 months ago
- Official PyTorch Implementation for Task Vectors are Cross-Modal☆22Updated 2 months ago
- [NeurIPS 2023] Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning☆31Updated last year
- The official GitHub page for the survey paper "A Survey of RWKV".☆22Updated last month
- Collect papers about Mamba (a selective state space model).☆14Updated 6 months ago
- Implementation of the model "Hedgehog" from the paper: "The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry"☆13Updated 11 months ago
- The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …☆16Updated last year
- [WACV 2025] Official implementation of "Online-LoRA: Task-free Online Continual Learning via Low Rank Adaptation" by Xiwen Wei, Guihong L…☆32Updated 3 months ago
- Code repository for the public reproduction of the language modelling experiments on "MatFormer: Nested Transformer for Elastic Inference…☆19Updated last year
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆24Updated 4 months ago
- PyTorch implementation of StableMask (ICML'24)☆12Updated 8 months ago
- Do Vision and Language Models Share Concepts? A Vector Space Alignment Study☆14Updated 3 months ago
- Code for T-MARS data filtering☆35Updated last year
- ☆13Updated 2 months ago
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)☆23Updated 7 months ago
- [ICCV23] Official implementation of eP-ALM: Efficient Perceptual Augmentation of Language Models.☆27Updated last year
- ☆38Updated 3 months ago
- Implementation of "PaLM2-VAdapter:" from the multi-modal model paper: "PaLM2-VAdapter: Progressively Aligned Language Model Makes a Stron…☆17Updated 3 months ago
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]☆14Updated this week
- ☆10Updated last year
- MIO: A Foundation Model on Multimodal Tokens☆21Updated 2 months ago