andyjm3 / SLTrainLinks

SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining (NeurIPS 2024)

☆35

Alternatives and similar repositories for SLTrain

Users that are interested in SLTrain are comparing it to the libraries listed below

Sorting:

hahnyuan / ASVD4LLM
Activation-aware Singular Value Decomposition for Compressing Large Language Models
☆80Updated last year
osehmathias / lisa
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
☆35Updated last year
yxli2123 / LoSparse
☆61Updated 2 years ago
song-wx / SIFT
[ICML2024 Spotlight] Fine-Tuning Pre-trained Large Language Models Sparsely
☆22Updated last year
zqOuO / GWT
☆13Updated 9 months ago
VITA-Group / Random-MoE-as-Dropout
[ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…
☆55Updated 2 years ago
zyxxmu / DSnoT
Official Pytorch Implementation of Our Paper Accepted at ICLR 2024-- Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLM…
☆50Updated last year
OPTML-Group / DeepZero
[ICLR'24] "DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training" by Aochuan Chen*, Yimeng Zhang*, Jinghan Jia, James Di…
☆66Updated last year
VITA-Group / Junk_DNA_Hypothesis
[ICML 2024] Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity; Lu Yin*, Ajay Jaiswal*, Shiwei Liu, So…
☆16Updated 6 months ago
deep-spin / adasplash
AdaSplash: Adaptive Sparse Flash Attention (aka Flash Entmax Attention)
☆26Updated 3 weeks ago
TianjinYellow / SPAM-Optimizer
☆34Updated 7 months ago
alvin-zyl / CoLA
Implementation of CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation
☆23Updated 8 months ago
cjyaras / deep-lora-transformers
Compressible Dynamics in Deep Overparameterized Low-Rank Learning & Adaptation (ICML'24 Oral)
☆13Updated last year
locuslab / massive-activations
Code accompanying the paper "Massive Activations in Large Language Models"
☆184Updated last year
sail-sg / Attention-Sink
[ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)
☆131Updated 3 months ago
ldery / Bonsai
Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"
☆28Updated last year
imagination-research / EEP
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
☆22Updated 10 months ago
SempraETY / Pruning-via-Merging
☆20Updated 11 months ago
GATECH-EIC / Linearized-LLM
[ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
☆36Updated last year
Qualcomm-AI-research / llm-surgeon
☆33Updated last year
kamanphoebe / Look-into-MoEs
[NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models
☆55Updated 8 months ago
JeanKaddour / NoTrainNoGain
Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)
☆80Updated 2 years ago
Model-GLUE / Model-GLUE
☆18Updated last year
zyushun / hessian-spectrum
Code for the paper: Why Transformers Need Adam: A Hessian Perspective
☆64Updated 7 months ago
Infini-AI-Lab / Kinetics
Kinetics: Rethinking Test-Time Scaling Laws
☆81Updated 3 months ago
abdelfattah-lab / TokenButler
☆25Updated 2 months ago
EnnengYang / RepresentationSurgery
Representation Surgery for Multi-Task Model Merging. ICML, 2024.
☆46Updated last year
LIONS-EPFL / scion
☆41Updated this week
VijayLingam95 / SVFT
☆33Updated 8 months ago
qiuzh20 / gated_attention
The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink…
☆95Updated last month