declare-lab / dellaLinks

DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling

☆36

Alternatives and similar repositories for della

Users that are interested in della are comparing it to the libraries listed below

Sorting:

yule-BUAA / MergeLLM
Codes for Merging Large Language Models
☆33Updated last year
qiuzh20 / EMoE
Official PyTorch Implementation of EMoE: Unlocking Emergent Modularity in Large Language Models [main conference @ NAACL2024]
☆35Updated last year
SempraETY / Pruning-via-Merging
☆20Updated 10 months ago
YangLing0818 / SuperCorrect-llm
[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
☆82Updated 6 months ago
MingLiiii / Layer_Gradient
[ACL'25 Oral] What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
☆75Updated 3 months ago
wuhy68 / Parameter-Efficient-MoE
Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)
☆147Updated last year
TUDB-Labs / MoE-PEFT
An Efficient LLM Fine-Tuning Factory Optimized for MoE PEFT
☆124Updated 7 months ago
alisawuffles / proxy-tuning
Code associated with Tuning Language Models by Proxy (Liu et al., 2024)
☆121Updated last year
GeniusHTX / TALE
☆133Updated last month
harveyhuang18 / EMR_Merging
[NeurIPS 2024 Spotlight] EMR-Merging: Tuning-Free High-Performance Model Merging
☆69Updated 7 months ago
ShiZhengyan / InstructionModelling
[NeurIPS 2024 Main Track] Code for the paper titled "Instruction Tuning With Loss Over Instructions"
☆39Updated last year
Lucky-Lance / Expert_Sparsity
[ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
☆105Updated last year
TUDB-Labs / MixLoRA
State-of-the-art Parameter-Efficient MoE Fine-tuning Method
☆191Updated last year
horseee / CoT-Valve
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
☆86Updated 8 months ago
songmzhang / DSKD
Repo for the EMNLP'24 Paper "Dual-Space Knowledge Distillation for Large Language Models". A general white-box KD framework for both same…
☆60Updated last month
prateeky2806 / ties-merging
☆194Updated last year
SalesforceAIResearch / GemFilter
☆85Updated 9 months ago
UCSB-NLP-Chang / ThinkPrune
☆44Updated 3 weeks ago
FeiyuZhang98 / IncreLoRA
☆33Updated 2 years ago
UNITES-Lab / MC-SMoE
[ICLR‘24 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"
☆95Updated 3 months ago
zjunlp / LightThinker
[EMNLP 2025] LightThinker: Thinking Step-by-Step Compression
☆106Updated 6 months ago
bethgelab / sober-reasoning
A Sober Look at Language Model Reasoning
☆84Updated last week
qiuzh20 / gated_attention
The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink…
☆89Updated last month
alessiodevoto / l2compress
Code for the EMNLP24 paper "A simple and effective L2 norm based method for KV Cache compression."
☆16Updated 10 months ago
SeanLeng1 / Reward-Calibration
☆18Updated 10 months ago
kamanphoebe / Look-into-MoEs
[NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models
☆55Updated 8 months ago
Infini-AI-Lab / S2FT
☆19Updated 9 months ago
chujiezheng / LLM-Extrapolation
Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"
☆75Updated 4 months ago
PKU-ML / LongPPL
Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"
☆102Updated last week
hahahawu / Long-to-Short-via-Model-Merging
Model merging is a highly efficient approach for long-to-short reasoning.
☆86Updated 4 months ago