rachtsy / MomentumSMoELinks
Implementation for MomentumSMoE
☆18Updated 3 months ago
Alternatives and similar repositories for MomentumSMoE
Users that are interested in MomentumSMoE are comparing it to the libraries listed below
Sorting:
- Unofficial Implementation of Selective Attention Transformer☆17Updated 9 months ago
- [NeurIPS 2024] AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models☆26Updated 2 months ago
- source code for NeurIPS'24 paper "HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection"☆52Updated 4 months ago
- ☆28Updated 9 months ago
- ☆43Updated 9 months ago
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆42Updated 9 months ago
- An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)☆29Updated 5 months ago
- ☆19Updated 4 months ago
- The official implementation of ICLR 2025 paper "Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models".☆15Updated 3 months ago
- [ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…☆25Updated 2 weeks ago
- Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models☆25Updated 3 months ago
- [COLM 2025] JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model☆18Updated last month
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆15Updated 5 months ago
- PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆39Updated 9 months ago
- a training-free approach to accelerate ViTs and VLMs by pruning redundant tokens based on similarity☆30Updated 2 months ago
- [NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models☆48Updated 3 months ago
- The official repo of continuous speculative decoding☆27Updated 4 months ago
- EMPO, A Fully Unsupervised RLVR Method☆56Updated last week
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)☆28Updated 4 months ago
- [Technical Report] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with …☆60Updated 10 months ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆107Updated last month
- [NeurIPS 2024] RaVL: Discovering and Mitigating Spurious Correlations in Fine-Tuned Vision-Language Models☆25Updated 9 months ago
- Remasking Discrete Diffusion Models with Inference-Time Scaling☆36Updated 5 months ago
- ☆16Updated 3 months ago
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆30Updated 3 months ago
- [NeurIPS 2024 Spotlight] EMR-Merging: Tuning-Free High-Performance Model Merging☆62Updated 5 months ago
- This repo contains the source code for VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks (NeurIPS 2024).☆39Updated 9 months ago
- ☆48Updated 3 weeks ago
- ☆34Updated 5 months ago
- Official repo of Progressive Data Expansion: data, code and evaluation☆29Updated last year