GenRobo / MatMambaLinks

Code and pretrained models for the paper: "MatMamba: A Matryoshka State Space Model"

☆61

Alternatives and similar repositories for MatMamba

Users that are interested in MatMamba are comparing it to the libraries listed below

Sorting:

ContextualAI / CLAIR_and_APO
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
☆60Updated last year
RobertCsordas / moeut
☆89Updated last year
epfml / DenseFormer
☆82Updated last year
Aleph-Alpha-Research / trigrams
☆58Updated 2 weeks ago
KaiNylund / lm-weights-encode-time
☆69Updated last year
fkodom / soft-mixture-of-experts
PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)
☆78Updated 2 years ago
zaydzuhri / flame
Fork of Flame repo for training of some new stuff in development
☆19Updated this week
jkallini / mrt5
Code repository for the paper "MrT5: Dynamic Token Merging for Efficient Byte-level Language Models."
☆51Updated 2 months ago
facebookresearch / adaptive_scheduling
Experimental scripts for researching data adaptive learning rate scheduling.
☆22Updated 2 years ago
ml-jku / hopfield-boosting
☆33Updated last year
lucidrains / grokfast-pytorch
Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
☆103Updated 11 months ago
togethercomputer / Dragonfly
☆80Updated last year
g-luo / vlm_cross_modal_reps
Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025
☆31Updated 7 months ago
joey00072 / ohara
Collection of autoregressive model implementation
☆85Updated 7 months ago
ariG23498 / mmdp
☆30Updated 4 months ago
lucidrains / mirasol-pytorch
Implementation of 🌻 Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorch
☆90Updated last year
MaxBelitsky / cache-steering
KV Cache Steering for Inducing Reasoning in Small Language Models
☆42Updated 4 months ago
kyegomez / Infini-attention
Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…
☆58Updated this week
nahidalam / maya
Maya: An Instruction Finetuned Multilingual Multimodal Model using Aya
☆123Updated 4 months ago
lucidrains / AMIE-pytorch
Implementation of the general framework for AMIE, from the paper "Towards Conversational Diagnostic AI", out of Google Deepmind
☆71Updated last year
ml-jku / EVA
One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation
☆45Updated last month
TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆78Updated last year
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆131Updated last month
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆102Updated last year
giangdip2410 / HyperRouter
Code for this paper "HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts via HyperNetwork"
☆33Updated 2 years ago
r-three / phatgoose
Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"
☆91Updated last year
foundation-model-stack / bamba
Train, tune, and infer Bamba model
☆136Updated 6 months ago
Zyphra / Zamba2
PyTorch implementation of models from the Zamba2 series.
☆186Updated 10 months ago
mistralai / mistral-evals
☆78Updated 2 weeks ago
ahstat / episodic-memory-benchmark
Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Mode…
☆60Updated 2 months ago