GenRobo / MatMambaLinks
Code and pretrained models for the paper: "MatMamba: A Matryoshka State Space Model"
☆62Updated last year
Alternatives and similar repositories for MatMamba
Users that are interested in MatMamba are comparing it to the libraries listed below
Sorting:
- ☆82Updated last year
- ☆68Updated last year
- Code repository for the paper "MrT5: Dynamic Token Merging for Efficient Byte-level Language Models."☆53Updated 4 months ago
- ☆91Updated last year
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆61Updated last year
- ☆59Updated 2 months ago
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆91Updated last year
- MEXMA: Token-level objectives improve sentence representations☆42Updated last year
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆46Updated 3 months ago
- KV Cache Steering for Inducing Reasoning in Small Language Models☆46Updated 6 months ago
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆31Updated 9 months ago
- ☆57Updated last month
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆103Updated last year
- ☆71Updated last year
- Code, results and other artifacts from the paper introducing the WildChat-50m dataset and the Re-Wild model family.☆34Updated 10 months ago
- ☆56Updated last year
- ☆80Updated last year
- Implementation of 🌻 Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorch☆91Updated 2 years ago
- Official Repository of Pretraining Without Attention (BiGS), BiGS is the first model to achieve BERT-level transfer learning on the GLUE …☆116Updated last year
- PyTorch implementation of models from the Zamba2 series.☆186Updated last year
- Implementation of the general framework for AMIE, from the paper "Towards Conversational Diagnostic AI", out of Google Deepmind☆74Updated last year
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆102Updated last year
- Fork of Flame repo for training of some new stuff in development☆19Updated last month
- Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.☆160Updated last year
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"☆57Updated last year
- Code for the examples presented in the talk "Training a Llama in your backyard: fine-tuning very large models on consumer hardware" given…☆15Updated 2 years ago
- LoRA-Ensemble: Efficient Uncertainty Modelling for Self-attention Networks☆54Updated 4 months ago
- Code for this paper "HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts via HyperNetwork"☆33Updated 2 years ago
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆150Updated 4 months ago
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated last year