GenRobo / MatMamba
Code and pretrained models for the paper: "MatMamba: A Matryoshka State Space Model"
☆59Updated 5 months ago
Alternatives and similar repositories for MatMamba
Users that are interested in MatMamba are comparing it to the libraries listed below
Sorting:
- Code, results and other artifacts from the paper introducing the WildChat-50m dataset and the Re-Wild model family.☆29Updated last month
- MEXMA: Token-level objectives improve sentence representations☆41Updated 4 months ago
- Exploring an idea where one forgets about efficiency and carries out attention across each edge of the nodes (tokens)☆50Updated last month
- Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open data☆21Updated 9 months ago
- Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks" [to appear at ICLR 2025]☆19Updated 2 months ago
- A testbed for agents and environments that can automatically improve models through data generation.☆24Updated 2 months ago
- Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Mode…☆41Updated last month
- Experimental scripts for researching data adaptive learning rate scheduling.☆23Updated last year
- [WACV 2025] Official implementation of "Online-LoRA: Task-free Online Continual Learning via Low Rank Adaptation" by Xiwen Wei, Guihong L…☆36Updated 5 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆57Updated 8 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆54Updated 8 months ago
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆39Updated 7 months ago
- ☆56Updated this week
- Official implementation of ECCV24 paper: POA☆24Updated 9 months ago
- ☆68Updated 8 months ago
- ☆25Updated last year
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆26Updated 6 months ago
- ☆78Updated 8 months ago
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated last year
- PyTorch implementation for MRL☆18Updated last year
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆17Updated last month
- ☆81Updated last year
- Official Code for Paper: Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation☆63Updated last week
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆37Updated last year
- ☆72Updated 3 weeks ago
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆55Updated 3 weeks ago
- ☆22Updated 4 months ago
- This repo is based on https://github.com/jiaweizzhao/GaLore☆27Updated 7 months ago
- Code repository for the paper "MrT5: Dynamic Token Merging for Efficient Byte-level Language Models."☆40Updated last month
- Induce brain-like topographic structure in your neural networks☆58Updated 2 months ago