ScaledFoundations / MatMamba
Code and pretrained models for the paper: "MatMamba: A Matryoshka State Space Model"
β50Updated last month
Related projects β
Alternatives and complementary repositories for MatMamba
- DPO, but faster πβ21Updated 2 weeks ago
- Official implementation of ECCV24 paper: POAβ24Updated 3 months ago
- β21Updated last week
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"β49Updated 2 months ago
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Modeβ¦β77Updated last month
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptationβ29Updated 3 weeks ago
- Implementation of Bitune: Bidirectional Instruction-Tuningβ15Updated 5 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"β91Updated last month
- HGRN2: Gated Linear RNNs with State Expansionβ49Updated 2 months ago
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"β35Updated 10 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignmentβ46Updated 2 months ago
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]β33Updated 2 months ago
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"β23Updated last week
- Code repository for the public reproduction of the language modelling experiments on "MatFormer: Nested Transformer for Elastic Inferenceβ¦β18Updated 11 months ago
- Research on Tabular Foundation Modelsβ27Updated 2 months ago
- Implementation of the paper: "BRAVE : Broadening the visual encoding of vision-language models"β21Updated this week
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.β33Updated 8 months ago
- Official code repository for paper: "ExPLoRA: Parameter-Efficient Extended Pre-training to Adapt Vision Transformers under Domain Shifts"β24Updated last month
- Official Implementation Of The Paper: `DeciMamba: Exploring the Length Extrapolation Potential of Mamba'β20Updated 3 months ago
- [Under Review] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with enlaβ¦β45Updated last month
- Collection of autoregressive model implementationβ66Updated last week
- β61Updated 2 months ago
- β29Updated this week
- β25Updated 2 months ago
- Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrievalβ24Updated last week
- β38Updated 3 months ago
- β40Updated this week
- Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)β25Updated 3 months ago
- β34Updated 8 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"β36Updated 11 months ago