jongwooko / distillm-2Links

Official PyTorch implementation of DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs (ICML 2025 Oral)

☆29

Alternatives and similar repositories for distillm-2

Users that are interested in distillm-2 are comparing it to the libraries listed below

Sorting:

VITA-Group / Random-MoE-as-Dropout
[ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…
☆52Updated 2 years ago
jihoontack / MAC
Online Adaptation of Language Models with a Memory of Amortized Contexts (NeurIPS 2024)
☆63Updated 11 months ago
chuanyang-Zheng / DAPE
The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"
☆38Updated 9 months ago
yegcjs / DiffusionLLM
Code for paper "Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning"
☆81Updated last year
nbasyl / DoRA
Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"
☆124Updated last year
naver-ai / model-stock
Model Stock: All we need is just a few fine-tuned models
☆118Updated 9 months ago
UKPLab / iclr2024-model-merging
This is the repository for "Model Merging by Uncertainty-Based Gradient Matching", ICLR 2024.
☆28Updated last year
ShiZhengyan / DePT
[ICLR 2024] This is the repository for the paper titled "DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning"
☆95Updated last year
VITA-Group / Junk_DNA_Hypothesis
[ICML 2024] Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity; Lu Yin*, Ajay Jaiswal*, Shiwei Liu, So…
☆16Updated 2 months ago
BaohaoLiao / mefts
[NeurIPS 2023] Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning
☆31Updated 2 years ago
PKU-ML / LongPPL
Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"
☆91Updated 2 months ago
TsinghuaC3I / SoRA
[EMNLP 2023, Main Conference] Sparse Low-rank Adaptation of Pre-trained Language Models
☆79Updated last year
arazd / ResidualPrompts
Residual Prompt Tuning: a method for faster and better prompt tuning.
☆55Updated 2 years ago
r-three / smear
☆30Updated last year
sail-sg / Attention-Sink
[ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)
☆96Updated last week
alinlab / HOMER
Official implementation of Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs (ICLR 2024).
☆42Updated 11 months ago
VITA-Group / LiGO
[ICLR 2023] "Learning to Grow Pretrained Models for Efficient Transformer Training" by Peihao Wang, Rameswar Panda, Lucas Torroba Hennige…
☆92Updated last year
aim-uofa / LoRAPrune
☆56Updated 7 months ago
sail-sg / scaling-with-vocab
[NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623
☆86Updated 9 months ago
thu-coai / MiniPLM
[ICLR 2025] MiniPLM: Knowledge Distillation for Pre-Training Language Models
☆50Updated 7 months ago
zhixuan-lin / forgetting-transformer
[ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"
☆115Updated last week
TencentARC / pi-Tuning
Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.
☆33Updated last year
cliang1453 / task-aware-distillation
Less is More: Task-aware Layer-wise Distillation for Language Model Compression (ICML2023)
☆34Updated last year
mrflogs / LoRA-Pro
Official code for our paper, "LoRA-Pro: Are Low-Rank Adapters Properly Optimized? "
☆126Updated 3 months ago
ycjing / Awesome-Model-Merging
A curated list of Model Merging methods.
☆92Updated 10 months ago
abdelfattah-lab / TokenButler
☆23Updated 3 months ago
kamanphoebe / Look-into-MoEs
[NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models
☆52Updated 5 months ago
VijayLingam95 / SVFT
☆30Updated 5 months ago
yikangshen / MoA
Mixture of Attention Heads
☆47Updated 2 years ago
ldery / Bonsai
Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"
☆28Updated last year