ScaledFoundations / MatMamba

Code and pretrained models for the paper: "MatMamba: A Matryoshka State Space Model"

☆59

Alternatives and similar repositories for MatMamba:

Users that are interested in MatMamba are comparing it to the libraries listed below

ml-jku / EVA
One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation
☆39Updated 6 months ago
Christina200 / Online-LoRA-official
[WACV 2025] Official implementation of "Online-LoRA: Task-free Online Continual Learning via Low Rank Adaptation" by Xiwen Wei, Guihong L…
☆35Updated 5 months ago
neilwen987 / CSR_Adaptive_Rep
Official Code for Paper: Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation
☆58Updated 2 weeks ago
SamsungSAILMontreal / nino
Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks" [to appear at ICLR 2025]
☆18Updated last month
Qichuzyy / POA
Official implementation of ECCV24 paper: POA
☆24Updated 8 months ago
ContextualAI / CLAIR_and_APO
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
☆55Updated 7 months ago
RobertCsordas / moeut
☆77Updated 7 months ago
OpenNLPLab / HGRN2
HGRN2: Gated Linear RNNs with State Expansion
☆54Updated 7 months ago
lucidrains / mind-evolution
Implementation of Mind Evolution, Evolving Deeper LLM Thinking, from Deepmind
☆47Updated 2 months ago
facebookresearch / mexma
MEXMA: Token-level objectives improve sentence representations
☆40Updated 3 months ago
lucidrains / transformer-lm-gan
Explorations into adversarial losses on top of autoregressive loss for language modeling
☆35Updated last month
aszala / EnvGen
Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)
☆29Updated 9 months ago
lucidrains / self-reasoning-tokens-pytorch
Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonetto
☆55Updated 11 months ago
Aleph-Alpha / trigrams
☆54Updated 7 months ago
pchizhov / picky_bpe
BPE modification that implements removing of the intermediate tokens during tokenizer training.
☆25Updated 4 months ago
kyegomez / TTL
Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"
☆24Updated this week
ml-jku / hopfield-boosting
☆31Updated 11 months ago
foundation-model-stack / bamba
Train, tune, and infer Bamba model
☆88Updated 3 months ago
SriramB-98 / vit-decompose
☆22Updated 3 months ago
KaiNylund / lm-weights-encode-time
☆67Updated 8 months ago
huggingface / pixparse
Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open data
☆21Updated 8 months ago
prateeky2806 / ComPEFT
☆25Updated last year
codezakh / DataEnvGym
A testbed for agents and environments that can automatically improve models through data generation.
☆23Updated last month
lucidrains / grokfast-pytorch
Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
☆98Updated 3 months ago
data-for-agents / insta
Official Repo for InSTA: Towards Internet-Scale Training For Agents
☆24Updated this week
hyperevolnet / Terminator
The official repository for HyperZ⋅Z⋅W Operator Connects Slow-Fast Networks for Full Context Interaction.
☆36Updated last week
epfml / DenseFormer
☆79Updated last year
arcee-ai / DAM
☆48Updated 5 months ago
RobertCsordas / moe
Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"
☆36Updated last year
ltgoslo / bert-in-context
Official implementation of "BERTs are Generative In-Context Learners"
☆26Updated last month