rachtsy / MomentumSMoELinks
Implementation for MomentumSMoE
☆19Updated 8 months ago
Alternatives and similar repositories for MomentumSMoE
Users that are interested in MomentumSMoE are comparing it to the libraries listed below
Sorting:
- ☆21Updated 2 months ago
- ☆19Updated 9 months ago
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆17Updated 10 months ago
- ☆47Updated 3 months ago
- Learning to Skip the Middle Layers of Transformers☆16Updated 5 months ago
- DropKAN (Dropout Kolmogorov Arnold Networks)☆17Updated 6 months ago
- An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)☆35Updated 10 months ago
- ☆36Updated 9 months ago
- ☆50Updated 11 months ago
- [NeurIPS '25] Multi-Token Prediction Needs Registers☆26Updated 3 weeks ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆40Updated last year
- Remasking Discrete Diffusion Models with Inference-Time Scaling☆63Updated 10 months ago
- ☆30Updated last year
- SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433☆116Updated last year
- [AAAI26] LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs☆50Updated last month
- implementations and experimentation on mHC by deepseek - https://arxiv.org/abs/2512.24880☆202Updated last week
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)☆32Updated 9 months ago
- ☆78Updated 11 months ago
- [ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)☆85Updated last year
- Unofficial Implementation of Selective Attention Transformer☆19Updated last year
- [NeurIPS 2025] Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models☆57Updated last month
- ☆17Updated 5 months ago
- source code for NeurIPS'24 paper "HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection"☆64Updated 9 months ago
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆56Updated 7 months ago
- ☆46Updated last year
- [ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…☆27Updated 5 months ago
- The official implementation of ICLR 2025 paper "Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models".☆15Updated 8 months ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆151Updated 6 months ago
- [ICLR 2025] Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization☆20Updated 3 months ago
- LoRA-One: One-Step Full Gradient Could Suffice for Fine-Tuning Large Language Models, Provably and Efficiently (ICML2025 Oral)☆28Updated 2 months ago