RAIVNLab / MatFormer-OLMoLinks

Code repository for the public reproduction of the language modelling experiments on "MatFormer: Nested Transformer for Elastic Inference"

☆27

Alternatives and similar repositories for MatFormer-OLMo

Users that are interested in MatFormer-OLMo are comparing it to the libraries listed below

Sorting:

renll / SeqBoat
[NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling
☆38Updated last year
VILA-Lab / GBLM-Pruner
Are gradient information useful for pruning of LLMs?
☆46Updated last year
GATECH-EIC / Linearized-LLM
[ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
☆33Updated last year
prateeky2806 / ComPEFT
☆26Updated last year
IST-DASLab / RoSA
Official implementation of the ICML 2024 paper RoSA (Robust Adaptation)
☆41Updated last year
RobertCsordas / moeut
☆83Updated 11 months ago
kamanphoebe / Look-into-MoEs
[NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models
☆52Updated 6 months ago
smonsays / hypernetwork-attention
Official code for the paper "Attention as a Hypernetwork"
☆40Updated last year
thunlp / SparsingLaw
The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".
☆23Updated 8 months ago
VITA-Group / Random-MoE-as-Dropout
[ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…
☆53Updated 2 years ago
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆62Updated last year
pixeli99 / MixLN
[ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…
☆25Updated 2 weeks ago
shreyansh26 / Attention-Mask-Patterns
Using FlexAttention to compute attention with different masking patterns
☆44Updated 10 months ago
Infini-AI-Lab / gsm_infinite
☆51Updated last month
OpenNLPLab / HGRN2
HGRN2: Gated Linear RNNs with State Expansion
☆55Updated 11 months ago
locuslab / scaling_laws_data_filtering
☆65Updated last year
ducdauge / sft-llm
Scaling Sparse Fine-Tuning to Large Language Models
☆16Updated last year
IST-DASLab / MicroAdam
This repository contains code for the MicroAdam paper.
☆19Updated 7 months ago
epfml / pam
☆17Updated last year
IST-DASLab / SparseFinetuning
Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry
☆42Updated last year
RobertCsordas / moe_layer
sigma-MoE layer
☆20Updated last year
dangxingyu / rnn-icrag
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Updated last year
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆98Updated 10 months ago
glassroom / heinsen_attention
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
☆24Updated last year
DRSY / KV_Compression
[EMNLP 2023]Context Compression for Auto-regressive Transformers with Sentinel Tokens
☆25Updated last year
VITA-Group / WeLore
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…
☆47Updated 3 months ago
TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆78Updated last year
kazuki-irie / kv-memory-brain
Official Code Repository for the paper "Key-value memory in the brain"
☆27Updated 5 months ago
PiotrNawrot / sparse-frontier
The evaluation framework for training-free sparse attention in LLMs
☆88Updated last month
shaochenze / PatchTrain
Code for paper "Patch-Level Training for Large Language Models"
☆86Updated 8 months ago