melisa-writer / short-transformersLinks

Prune transformer layers

☆74

Alternatives and similar repositories for short-transformers

Users that are interested in short-transformers are comparing it to the libraries listed below

Sorting:

mengxiayu / LLMSuperWeight
Code for studying the super weight in LLM
☆120Updated 11 months ago
bminixhofer / zett
Code for Zero-Shot Tokenizer Transfer
☆141Updated 10 months ago
CASE-Lab-UMD / LLM-Drop
The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".
☆180Updated last week
siyan-zhao / prepacking
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …
☆60Updated last year
kyleliang919 / Online-Subspace-Descent
[NeurIPS 2024] Low rank memory efficient optimizer without SVD
☆30Updated 4 months ago
FasterDecoding / BitDelta
☆203Updated 11 months ago
VITA-Group / Q-GaLore
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
☆202Updated last year
PiotrNawrot / nano-sparse-attention
The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.
☆91Updated 4 months ago
HanGuo97 / lq-lora
☆128Updated last year
sanyalsunny111 / LLM-Inheritune
This is the official repository for Inheritune.
☆115Updated 9 months ago
whyNLP / LCKV
Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…
☆156Updated 7 months ago
booydar / babilong
BABILong is a benchmark for LLM evaluation using the needle-in-a-haystack approach.
☆215Updated 2 months ago
alperiox / Compact-Language-Models-via-Pruning-and-Knowledge-Distillation
Unofficial implementation of https://arxiv.org/pdf/2407.14679
☆50Updated last year
SeunghyunSEO / optimized_hf_llama_class_for_training
☆48Updated last year
NathanGodey / qfilters
Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)
☆35Updated 8 months ago
joey00072 / ohara
Collection of autoregressive model implementation
☆86Updated 6 months ago
VITA-Group / WeLore
[ICML 2025] From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories and Applications
☆51Updated 3 weeks ago
minyoungg / LTE
☆69Updated last year
astramind-ai / Mixture-of-depths
Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆175Updated last year
jeffreysijuntan / lloco
The official repo for "LLoCo: Learning Long Contexts Offline"
☆118Updated last year
wdlctc / mini-s
☆52Updated last year
SalesforceAIResearch / GemFilter
☆85Updated last week
wuhy68 / Parameter-Efficient-MoE
Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)
☆147Updated last year
IST-DASLab / peft-rosa
A fork of the PEFT library, supporting Robust Adaptation (RoSA)
☆15Updated last year
RobertCsordas / moeut
☆88Updated last year
samchaineau / llm_slerp_generation
Repo hosting codes and materials related to speeding LLMs' inference using token merging.
☆37Updated last month
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆173Updated 4 months ago
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆85Updated last year
huggingface / llm-swarm
Manage scalable open LLM inference endpoints in Slurm clusters
☆277Updated last year
PiotrNawrot / sparse-frontier
The evaluation framework for training-free sparse attention in LLMs
☆103Updated last month