ScaledFoundations / MatMamba
Code and pretrained models for the paper: "MatMamba: A Matryoshka State Space Model"
☆58Updated 4 months ago
Alternatives and similar repositories for MatMamba:
Users that are interested in MatMamba are comparing it to the libraries listed below
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆38Updated 5 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆55Updated 6 months ago
- [WACV 2025] Official implementation of "Online-LoRA: Task-free Online Continual Learning via Low Rank Adaptation" by Xiwen Wei, Guihong L…☆33Updated 4 months ago
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆24Updated 4 months ago
- ☆67Updated 7 months ago
- ☆25Updated last year
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"☆52Updated 6 months ago
- ☆73Updated 7 months ago
- Official implementation of ECCV24 paper: POA☆24Updated 7 months ago
- Train, tune, and infer Bamba model☆86Updated 2 months ago
- Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks" [to appear at ICLR 2025]☆18Updated last week
- ☆31Updated 10 months ago
- PyTorch implementation for MRL☆18Updated last year
- A repository for research on medium sized language models.☆76Updated 9 months ago
- Aioli: A unified optimization framework for language model data mixing☆22Updated 2 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆36Updated last year
- This library supports evaluating disparities in generated image quality, diversity, and consistency between geographic regions.☆20Updated 9 months ago
- ☆63Updated 5 months ago
- Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open data☆21Updated 7 months ago
- ☆52Updated 6 months ago
- ☆47Updated 6 months ago
- Code for the examples presented in the talk "Training a Llama in your backyard: fine-tuning very large models on consumer hardware" given…☆14Updated last year
- ☆43Updated last year
- Implementation of 🌻 Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorch☆88Updated last year
- ☆79Updated 11 months ago
- A testbed for agents and environments that can automatically improve models through data generation.☆21Updated 2 weeks ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆97Updated 5 months ago
- ☆65Updated this week