tding1 / Efficient-LLM-Survey

The Efficiency Spectrum of LLM

☆53

Alternatives and similar repositories for Efficient-LLM-Survey:

Users that are interested in Efficient-LLM-Survey are comparing it to the libraries listed below

raymin0223 / fast_robust_early_exit
Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)
☆56Updated 4 months ago
JeanKaddour / NoTrainNoGain
Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)
☆79Updated last year
VITA-Group / Random-MoE-as-Dropout
[ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…
☆48Updated last year
VITA-Group / WeLore
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…
☆42Updated 7 months ago
rayleizhu / vllm-ra
[ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts
☆38Updated 11 months ago
PKU-ML / LongPPL
☆27Updated 3 months ago
Leooyii / LCEG
Long Context Extension and Generalization in LLMs
☆48Updated 5 months ago
hdong920 / GRIFFIN
☆36Updated 5 months ago
kamanphoebe / Look-into-MoEs
[NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models
☆43Updated 2 weeks ago
tanyuqian / redco
NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference
☆64Updated 2 months ago
allenai / OLMo-core
PyTorch building blocks for the OLMo ecosystem
☆54Updated this week
srush / LLM-Talk
☆48Updated last year
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆58Updated 9 months ago
HanGuo97 / lq-lora
☆125Updated last year
shaochenze / PatchTrain
Code for paper "Patch-Level Training for Large Language Models"
☆80Updated 3 months ago
locuslab / scaling_laws_data_filtering
☆64Updated 10 months ago
berlino / gated_linear_attention
☆99Updated 11 months ago
kyleliang919 / Online-Subspace-Descent
This repo is based on https://github.com/jiaweizzhao/GaLore
☆24Updated 5 months ago
teelinsan / parallel-decoding
Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"
☆114Updated 11 months ago
UNITES-Lab / MC-SMoE
[ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"
☆72Updated 8 months ago
song-wx / SIFT
[ICML2024 Spotlight] Fine-Tuning Pre-trained Large Language Models Sparsely
☆21Updated 7 months ago
ldery / Bonsai
Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"
☆27Updated 10 months ago
snu-mllab / Context-Memory
Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)
☆53Updated 10 months ago
CASE-Lab-UMD / Unified-MoE-Compression
The official implementation of the paper "Demystifying the Compression of Mixture-of-Experts Through a Unified Framework".
☆59Updated 3 months ago
hahnyuan / ASVD4LLM
Activation-aware Singular Value Decomposition for Compressing Large Language Models
☆56Updated 3 months ago
VITA-Group / Ms-PoE
"Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding" Zhenyu Zhang, Runjin Chen, Shiw…
☆25Updated 9 months ago
IST-DASLab / RoSA
Official implementation of the ICML 2024 paper RoSA (Robust Adaptation)
☆38Updated last year
sail-sg / scaling-with-vocab
[NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623
☆77Updated 4 months ago
Zanette-Labs / SpeculativeRejection
[NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection
☆39Updated 3 months ago
GATECH-EIC / Linearized-LLM
[ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
☆28Updated 8 months ago