nverma1 / merging-text-transformersLinks

Code for "Merging Text Transformers from Different Initializations"

☆20

Alternatives and similar repositories for merging-text-transformers

Users that are interested in merging-text-transformers are comparing it to the libraries listed below

Sorting:

Infini-AI-Lab / S2FT
☆17Updated 5 months ago
kiddyboots216 / lottery-ticket-adaptation
Lottery Ticket Adaptation
☆39Updated 7 months ago
tianyi-lab / Mosaic-IT
[ACL'25] Mosaic-IT: Cost-Free Compositional Data Synthesis for Instruction Tuning
☆19Updated this week
zzwjames / FailureLLMUnlearning
An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)
☆27Updated 4 months ago
locuslab / T-MARS
Code for T-MARS data filtering
☆35Updated last year
hamishivi / automated-instruction-selection
Exploration of automated dataset selection approaches at large scales.
☆45Updated 3 months ago
ShiZhengyan / InstructionModelling
[NeurIPS 2024 Main Track] Code for the paper titled "Instruction Tuning With Loss Over Instructions"
☆38Updated last year
Wang-ML-Lab / multimodal-needle-in-a-haystack
[NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models
☆46Updated last month
JayZhang42 / SLED
SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433
☆26Updated 6 months ago
maszhongming / ParaKnowTransfer
Code for "Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective"
☆32Updated last year
john-hewitt / implicit-ins
Codebase for Instruction Following without Instruction Tuning
☆34Updated 9 months ago
shoaibahmed / llm_depth_pruning
Official implementation of the paper: "A deeper look at depth pruning of LLMs"
☆15Updated 11 months ago
psunlpgroup / VisOnlyQA
This repository contains the code and data for the paper "VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception o…
☆23Updated 2 months ago
limenlp / safer-instruct
This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"
☆17Updated last year
BaohaoLiao / mefts
[NeurIPS 2023] Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning
☆31Updated 2 years ago
tanganke / weight-ensembling_MoE
Code for paper "Merging Multi-Task Models via Weight-Ensembling Mixture of Experts"
☆24Updated last year
wang-kee / LiNeS
Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"
☆29Updated 7 months ago
katiekang1998 / reasoning_generalization
☆32Updated 5 months ago
hartvigsen-group / composable-interventions
☆28Updated 3 months ago
yule-BUAA / MergeLLM
Codes for Merging Large Language Models
☆32Updated 10 months ago
tianyi-lab / C3PO
Code for "C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing"
☆15Updated 2 months ago
prateeky2806 / ComPEFT
☆25Updated last year
Cohere-Labs-Community / iterative-data-selection
☆28Updated 7 months ago
ml-jku / EVA
One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation
☆40Updated 8 months ago
wangxu0820 / NegativePrompt
The official GitHub page for paper "NegativePrompt: Leveraging Psychology for Large Language Models Enhancement via Negative Emotional St…
☆22Updated last year
kyleliang919 / Online-Subspace-Descent
This repo is based on https://github.com/jiaweizzhao/GaLore
☆28Updated 9 months ago
fangyuan-ksgk / CoT-Reasoning-without-Prompting
Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting
☆32Updated last year
IBM / ColPret
Efficient Scaling laws and collaborative pretraining.
☆16Updated 4 months ago
ctlllll / reward_collapse
☆27Updated 2 years ago
formll / resolving-scaling-law-discrepancies
☆20Updated 11 months ago