ScaledFoundations / MatMamba
Code and pretrained models for the paper: "MatMamba: A Matryoshka State Space Model"
☆57Updated 2 months ago
Alternatives and similar repositories for MatMamba:
Users that are interested in MatMamba are comparing it to the libraries listed below
- ☆51Updated 5 months ago
- [WACV 2025] Official implementation of "Online-LoRA: Task-free Online Continual Learning via Low Rank Adaptation" by Xiwen Wei, Guihong L…☆31Updated 3 months ago
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆36Updated 4 months ago
- Train, tune, and infer Bamba model☆83Updated last month
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆54Updated 5 months ago
- Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks" [to appear at ICLR 2025]☆18Updated last month
- Official implementation of ECCV24 paper: POA☆24Updated 6 months ago
- Official implementation of RMoE (Layerwise Recurrent Router for Mixture-of-Experts)☆17Updated 6 months ago
- MEXMA: Token-level objectives improve sentence representations☆40Updated last month
- ☆20Updated last month
- ☆71Updated 5 months ago
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"☆51Updated 5 months ago
- ☆25Updated last year
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆36Updated last year
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆24Updated 3 months ago
- PyTorch implementation for MRL☆18Updated 11 months ago
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆50Updated 2 months ago
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆35Updated last year
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆24Updated this week
- HGRN2: Gated Linear RNNs with State Expansion☆52Updated 5 months ago
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated 11 months ago
- Experimental scripts for researching data adaptive learning rate scheduling.☆23Updated last year
- Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonetto☆55Updated 8 months ago
- Code for this paper "HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts via HyperNetwork"☆31Updated last year
- Utilities for Training Very Large Models☆57Updated 4 months ago
- ☆59Updated last week
- ☆39Updated 6 months ago
- Code for "Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free"☆43Updated 4 months ago
- ☆42Updated last year