GenRobo / MatMambaLinks
Code and pretrained models for the paper: "MatMamba: A Matryoshka State Space Model"
☆61Updated 9 months ago
Alternatives and similar repositories for MatMamba
Users that are interested in MatMamba are comparing it to the libraries listed below
Sorting:
- ☆82Updated last year
- ☆69Updated last year
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆60Updated last year
- Code repository for the paper "MrT5: Dynamic Token Merging for Efficient Byte-level Language Models."☆45Updated 5 months ago
- Fork of Flame repo for training of some new stuff in development☆17Updated last week
- ☆58Updated 4 months ago
- Experimental scripts for researching data adaptive learning rate scheduling.☆22Updated last year
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆101Updated 8 months ago
- KV Cache Steering for Inducing Reasoning in Small Language Models☆39Updated last month
- MEXMA: Token-level objectives improve sentence representations☆41Updated 8 months ago
- ☆85Updated last year
- Supercharge huggingface transformers with model parallelism.☆77Updated last month
- Implementation of 🌻 Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorch☆89Updated last year
- Collection of autoregressive model implementation☆86Updated 4 months ago
- Official Repository of Pretraining Without Attention (BiGS), BiGS is the first model to achieve BERT-level transfer learning on the GLUE …☆114Updated last year
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆146Updated 3 months ago
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆88Updated last year
- Code, results and other artifacts from the paper introducing the WildChat-50m dataset and the Re-Wild model family.☆30Updated 5 months ago
- ☆51Updated 7 months ago
- Utilities for Training Very Large Models☆58Updated 11 months ago
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆42Updated 11 months ago
- ☆32Updated last year
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆34Updated last year
- This repository includes the code to download the curated HuggingFace papers into a single markdown formatted file☆14Updated last year
- ☆69Updated last year
- Code for this paper "HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts via HyperNetwork"☆33Updated last year
- Lottery Ticket Adaptation☆39Updated 9 months ago
- The official repository for HyperZ⋅Z⋅W Operator Connects Slow-Fast Networks for Full Context Interaction.☆39Updated 5 months ago
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 7 months ago
- Maya: An Instruction Finetuned Multilingual Multimodal Model using Aya☆116Updated last month