graldij / transformer-fusionLinks
Official repository of the "Transformer Fusion with Optimal Transport" paper, published as a conference paper at ICLR 2024.
☆27Updated last year
Alternatives and similar repositories for transformer-fusion
Users that are interested in transformer-fusion are comparing it to the libraries listed below
Sorting:
- Official repository of "Localizing Task Information for Improved Model Merging and Compression" [ICML 2024]☆45Updated 8 months ago
- [NeurIPS 2024 Spotlight] EMR-Merging: Tuning-Free High-Performance Model Merging☆59Updated 3 months ago
- Code for paper "Parameter Efficient Multi-task Model Fusion with Partial Linearization"☆21Updated 9 months ago
- Task Singular Vectors: Reducing Task Interference in Model Merging. Merge models avoiding task interference through separable models.☆16Updated last month
- Official Code for ICLR 2024 Paper: Non-negative Contrastive Learning☆45Updated last year
- ☆29Updated 3 weeks ago
- Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic☆26Updated 5 months ago
- This is the repository for "Model Merging by Uncertainty-Based Gradient Matching", ICLR 2024.☆27Updated last year
- Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging. Arxiv, 2024.☆13Updated 7 months ago
- Representation Surgery for Multi-Task Model Merging. ICML, 2024.☆45Updated 8 months ago
- source code for paper "Riemannian Preconditioned LoRA for Fine-Tuning Foundation Models"☆26Updated last year
- AdaMerging: Adaptive Model Merging for Multi-Task Learning. ICLR, 2024.☆83Updated 7 months ago
- ☆29Updated 4 months ago
- ☆13Updated 4 months ago
- [NeurIPS 2024] For paper Parameter Competition Balancing for Model Merging☆41Updated 8 months ago
- Source code of "Task arithmetic in the tangent space: Improved editing of pre-trained models".☆102Updated 2 years ago
- source code for NeurIPS'24 paper "HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection"☆46Updated 2 months ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆88Updated 8 months ago
- LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters☆35Updated 3 months ago
- [NeurIPS 2024] "Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?"☆34Updated 5 months ago
- EMPO, A Fully Unsupervised RLVR Method☆40Updated 2 weeks ago
- code for EMNLP 2024 paper: How do Large Language Models Learn In-Context? Query and Key Matrices of In-Context Heads are Two Towers for M…☆12Updated 7 months ago
- Data distillation benchmark☆66Updated last week
- Codes for Merging Large Language Models☆32Updated 10 months ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆38Updated 8 months ago
- [NeurIPS 2023] Github repository for "Composing Parameter-Efficient Modules with Arithmetic Operations"☆61Updated last year
- ☆35Updated last year
- [NeurIPS 2023] "Diversified Outlier Exposure for Out-of-Distribution Detection via Informative Extrapolation"☆11Updated last year
- A curated list of Model Merging methods.☆92Updated 9 months ago
- source code of (quasi-)Givens Orthogonal Fine Tuning integrated to peft lib☆17Updated 3 months ago