qiuzh20 / RMoE
Official implementation of RMoE (Layerwise Recurrent Router for Mixture-of-Experts)
☆20Updated 8 months ago
Alternatives and similar repositories for RMoE:
Users that are interested in RMoE are comparing it to the libraries listed below
- HGRN2: Gated Linear RNNs with State Expansion☆54Updated 8 months ago
- Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆15Updated last month
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)☆26Updated 2 weeks ago
- PyTorch implementation of StableMask (ICML'24)☆12Updated 9 months ago
- Code for "C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing"☆14Updated 2 weeks ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆30Updated 10 months ago
- ☆41Updated 5 months ago
- [NeurIPS 2023] Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning☆31Updated last year
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆37Updated 6 months ago
- ☆78Updated 8 months ago
- Official PyTorch Implementation for Task Vectors are Cross-Modal☆22Updated 4 months ago
- ☆39Updated last month
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆46Updated this week
- ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2☆63Updated 5 months ago
- Official implementation of the ICML 2024 paper RoSA (Robust Adaptation)☆40Updated last year
- Official implementation of ECCV24 paper: POA☆24Updated 8 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆42Updated last month
- A repository for DenseSSMs☆87Updated last year
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆54Updated 4 months ago
- ☆17Updated 3 months ago
- [ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…☆19Updated 4 months ago
- The official GitHub page for the survey paper "A Survey of RWKV".☆25Updated 3 months ago
- Code for T-MARS data filtering☆35Updated last year
- Here we will test various linear attention designs.☆60Updated last year
- Triton implement of bi-directional (non-causal) linear attention☆46Updated 2 months ago
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding☆49Updated 3 weeks ago
- [ACL 2023] PuMer: Pruning and Merging Tokens for Efficient Vision Language Models☆29Updated 6 months ago
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆17Updated 6 months ago
- [CVPR2025] Breaking the Low-Rank Dilemma of Linear Attention☆16Updated last month
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆25Updated 5 months ago