ScaledFoundations / MatMamba
Code and pretrained models for the paper: "MatMamba: A Matryoshka State Space Model"
☆59Updated 4 months ago
Alternatives and similar repositories for MatMamba:
Users that are interested in MatMamba are comparing it to the libraries listed below
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆39Updated 6 months ago
- [WACV 2025] Official implementation of "Online-LoRA: Task-free Online Continual Learning via Low Rank Adaptation" by Xiwen Wei, Guihong L…☆35Updated 5 months ago
- Official Code for Paper: Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation☆58Updated 2 weeks ago
- Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks" [to appear at ICLR 2025]☆18Updated last month
- Official implementation of ECCV24 paper: POA☆24Updated 8 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆55Updated 7 months ago
- ☆77Updated 7 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆54Updated 7 months ago
- Implementation of Mind Evolution, Evolving Deeper LLM Thinking, from Deepmind☆47Updated 2 months ago
- MEXMA: Token-level objectives improve sentence representations☆40Updated 3 months ago
- Explorations into adversarial losses on top of autoregressive loss for language modeling☆35Updated last month
- Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)☆29Updated 9 months ago
- Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonetto☆55Updated 11 months ago
- ☆54Updated 7 months ago
- BPE modification that implements removing of the intermediate tokens during tokenizer training.☆25Updated 4 months ago
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆24Updated this week
- ☆31Updated 11 months ago
- Train, tune, and infer Bamba model☆88Updated 3 months ago
- ☆22Updated 3 months ago
- ☆67Updated 8 months ago
- Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open data☆21Updated 8 months ago
- ☆25Updated last year
- A testbed for agents and environments that can automatically improve models through data generation.☆23Updated last month
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆98Updated 3 months ago
- Official Repo for InSTA: Towards Internet-Scale Training For Agents☆24Updated this week
- The official repository for HyperZ⋅Z⋅W Operator Connects Slow-Fast Networks for Full Context Interaction.☆36Updated last week
- ☆79Updated last year
- ☆48Updated 5 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆36Updated last year
- Official implementation of "BERTs are Generative In-Context Learners"☆26Updated last month