rachtsy / MomentumSMoELinks
Implementation for MomentumSMoE
☆18Updated 4 months ago
Alternatives and similar repositories for MomentumSMoE
Users that are interested in MomentumSMoE are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…☆26Updated last month
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)☆30Updated 4 months ago
- [NeurIPS 2024] AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models☆27Updated 2 months ago
- ☆17Updated 7 months ago
- ☆35Updated 5 months ago
- ☆19Updated 3 months ago
- Code for reproducing our paper "Low Rank Adapting Models for Sparse Autoencoder Features"☆14Updated 5 months ago
- ☆19Updated 5 months ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆119Updated last month
- Unofficial Implementation of Selective Attention Transformer☆17Updated 10 months ago
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆16Updated 5 months ago
- [COLING'25] Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?☆80Updated 7 months ago
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆30Updated 4 months ago
- ☆26Updated last year
- [EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs☆29Updated 3 months ago
- ☆45Updated last month
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆44Updated last month
- An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)☆29Updated 6 months ago
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆45Updated 10 months ago
- [NeurIPS 2024 Spotlight] EMR-Merging: Tuning-Free High-Performance Model Merging☆67Updated 6 months ago
- Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More☆33Updated 3 months ago
- The open-source Mixture of Depths code and the official implementation of the paper "Router-Tuning: A Simple and Effective Approach for E…☆15Updated 2 weeks ago
- [ICLR'25] Code for KaSA, an official implementation of "KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models"☆20Updated 7 months ago
- A holistic benchmark for LLM abstention☆48Updated last week
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆104Updated last week
- Awesome Low-Rank Adaptation☆43Updated 3 weeks ago
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models☆54Updated 6 months ago
- ☆15Updated 9 months ago
- ☆43Updated 9 months ago
- ☆31Updated 6 months ago