GenRobo / MatMambaLinks
Code and pretrained models for the paper: "MatMamba: A Matryoshka State Space Model"
β62Updated last year
Alternatives and similar repositories for MatMamba
Users that are interested in MatMamba are comparing it to the libraries listed below
Sorting:
- β82Updated last year
- Implementation of π» Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorchβ91Updated 2 years ago
- β91Updated last year
- KV Cache Steering for Inducing Reasoning in Small Language Modelsβ44Updated 5 months ago
- β59Updated last month
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"β103Updated last year
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignmentβ61Updated last year
- Implementation of the general framework for AMIE, from the paper "Towards Conversational Diagnostic AI", out of Google Deepmindβ72Updated last year
- β33Updated last year
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"β25Updated this week
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmindβ133Updated last month
- Code repository for the paper "MrT5: Dynamic Token Merging for Efficient Byte-level Language Models."β51Updated 3 months ago
- β69Updated last year
- A byte-level decoder architecture that matches the performance of tokenized Transformers.β66Updated last year
- PyTorch implementation of models from the Zamba2 series.β186Updated 11 months ago
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTOβ¦β58Updated this week
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"β102Updated last year
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptationβ45Updated 2 months ago
- GoldFinch and other hybrid transformer componentsβ45Updated last year
- The official repository for HyperZβ Zβ W Operator Connects Slow-Fast Networks for Full Context Interaction.β41Updated 8 months ago
- β80Updated last year
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"β91Updated last year
- An unofficial pytorch implementation of 'Efficient Infinite Context Transformers with Infini-attention'β54Updated last year
- Official Repository of Pretraining Without Attention (BiGS), BiGS is the first model to achieve BERT-level transfer learning on the GLUE β¦β116Updated last year
- A repository for research on medium sized language models.β77Updated last year
- Fork of Flame repo for training of some new stuff in developmentβ19Updated this week
- β55Updated last year
- MEXMA: Token-level objectives improve sentence representationsβ42Updated 11 months ago
- Implementation of Infini-Transformer in Pytorchβ113Updated 11 months ago
- Experimental scripts for researching data adaptive learning rate scheduling.β22Updated 2 years ago