Arenaa / Accelerated-Generation-TechniquesLinks

This repository contains papers for a comprehensive survey on accelerated generation techniques in Large Language Models (LLMs).

☆11

Alternatives and similar repositories for Accelerated-Generation-Techniques

Users that are interested in Accelerated-Generation-Techniques are comparing it to the libraries listed below

Sorting:

ziplab / QLLM
[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…
☆30Updated last year
GATECH-EIC / Linearized-LLM
[ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
☆36Updated last year
pprp / Pruner-Zero
[ICML24] Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs
☆94Updated 11 months ago
kamanphoebe / Look-into-MoEs
[NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models
☆55Updated 8 months ago
ldery / Bonsai
Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"
☆28Updated last year
IST-DASLab / RoSA
Official implementation of the ICML 2024 paper RoSA (Robust Adaptation)
☆44Updated last year
Infini-AI-Lab / S2FT
☆19Updated 9 months ago
john-hewitt / implicit-ins
Codebase for Instruction Following without Instruction Tuning
☆36Updated last year
song-wx / SIFT
[ICML2024 Spotlight] Fine-Tuning Pre-trained Large Language Models Sparsely
☆22Updated last year
VITA-Group / Random-MoE-as-Dropout
[ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…
☆55Updated 2 years ago
duterscmy / CD-MoE
Official PyTorch implementation of CD-MOE
☆12Updated 7 months ago
sramshetty / mixture-of-depths
An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆36Updated last year
VILA-Lab / GBLM-Pruner
Are gradient information useful for pruning of LLMs?
☆47Updated 2 months ago
pixeli99 / MixLN
[ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…
☆26Updated 3 months ago
locuslab / scaling_laws_data_filtering
☆65Updated last year
Doraemonzzz / xmixers
Xmixers: A collection of SOTA efficient token/channel mixers
☆29Updated last month
Infini-AI-Lab / gsm_infinite
☆55Updated 4 months ago
shaochenze / PatchTrain
Code for paper "Patch-Level Training for Large Language Models"
☆89Updated 11 months ago
AkideLiu / MiniCache
☆10Updated last year
yale-nlp / refdpo
☆16Updated last year
VITA-Group / WeLore
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…
☆51Updated 6 months ago
sail-sg / scaling-with-vocab
[NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623
☆89Updated last year
SalesforceAIResearch / GemFilter
☆85Updated 9 months ago
ShiZhengyan / InstructionModelling
[NeurIPS 2024 Main Track] Code for the paper titled "Instruction Tuning With Loss Over Instructions"
☆39Updated last year
kyegomez / Infini-attention
Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…
☆56Updated last week
PiotrNawrot / sparse-frontier
The evaluation framework for training-free sparse attention in LLMs
☆102Updated 2 weeks ago
snu-mllab / Context-Memory
Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)
☆61Updated last year
shoaibahmed / llm_depth_pruning
Official implementation of the paper: "A deeper look at depth pruning of LLMs"
☆15Updated last year
microsoft / AutoMoE
AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers
☆47Updated 3 years ago
RobertCsordas / moeut
☆86Updated last year