alperiox / Compact-Language-Models-via-Pruning-and-Knowledge-DistillationLinks

Unofficial implementation of https://arxiv.org/pdf/2407.14679

☆50

Alternatives and similar repositories for Compact-Language-Models-via-Pruning-and-Knowledge-Distillation

Users that are interested in Compact-Language-Models-via-Pruning-and-Knowledge-Distillation are comparing it to the libraries listed below

Sorting:

melisa-writer / short-transformers
Prune transformer layers
☆74Updated last year
UbiquitousLearning / SLM_Survey
☆100Updated last year
samchaineau / llm_slerp_generation
Repo hosting codes and materials related to speeding LLMs' inference using token merging.
☆37Updated last month
jeffreysijuntan / lloco
The official repo for "LLoCo: Learning Long Contexts Offline"
☆118Updated last year
VITA-Group / WeLore
[ICML 2025] From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories and Applications
☆51Updated 3 weeks ago
CASE-Lab-UMD / LLM-Drop
The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".
☆180Updated last week
HanGuo97 / lq-lora
☆128Updated last year
siyan-zhao / prepacking
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …
☆60Updated last year
fangyuan-ksgk / Tiny-GRPO
minimal GRPO implementation from scratch
☆99Updated 8 months ago
jiwonsong-dev / SLEB
[ICML 2024] Official Implementation of SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
☆37Updated 9 months ago
wuhy68 / Parameter-Efficient-MoE
Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)
☆147Updated last year
facebookresearch / LayerSkip
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
☆346Updated 6 months ago
itsnamgyu / block-transformer
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
☆162Updated 7 months ago
FasterDecoding / BitDelta
☆203Updated 11 months ago
whyNLP / LCKV
Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…
☆156Updated 7 months ago
SalesforceAIResearch / GemFilter
☆85Updated last week
astramind-ai / Mixture-of-depths
Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆175Updated last year
sanyalsunny111 / LLM-Inheritune
This is the official repository for Inheritune.
☆115Updated 9 months ago
NVlabs / Minitron
A family of compressed models obtained via pruning and knowledge distillation
☆356Updated 2 weeks ago
thepowerfuldeez / OLMo
My fork os allen AI's OLMo for educational purposes.
☆30Updated 11 months ago
mengxiayu / LLMSuperWeight
Code for studying the super weight in LLM
☆120Updated 11 months ago
JayZhang42 / SLED
SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433
☆110Updated 11 months ago
arcee-ai / EvolKit
EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…
☆242Updated last year
SeunghyunSEO / optimized_hf_llama_class_for_training
☆48Updated last year
CASE-Lab-UMD / Unified-MoE-Compression
The official implementation of the paper "Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques (TMLR)".
☆83Updated 8 months ago
zenrran4nlp / Awesome-LLM-Inference-Serving
☆46Updated 6 months ago
minyoungg / LTE
☆69Updated last year
hetailang / SqueezeAttention
☆38Updated last year
wdlctc / mini-s
☆52Updated last year
PiotrNawrot / sparse-frontier
The evaluation framework for training-free sparse attention in LLMs
☆103Updated last month