nverma1 / merging-text-transformersLinks
Code for "Merging Text Transformers from Different Initializations"
☆20Updated 4 months ago
Alternatives and similar repositories for merging-text-transformers
Users that are interested in merging-text-transformers are comparing it to the libraries listed below
Sorting:
- ☆17Updated 5 months ago
- Lottery Ticket Adaptation☆39Updated 7 months ago
- [ACL'25] Mosaic-IT: Cost-Free Compositional Data Synthesis for Instruction Tuning☆19Updated this week
- An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)☆27Updated 4 months ago
- Code for T-MARS data filtering☆35Updated last year
- Exploration of automated dataset selection approaches at large scales.☆45Updated 3 months ago
- [NeurIPS 2024 Main Track] Code for the paper titled "Instruction Tuning With Loss Over Instructions"☆38Updated last year
- [NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models☆46Updated last month
- SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433☆26Updated 6 months ago
- Code for "Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective"☆32Updated last year
- Codebase for Instruction Following without Instruction Tuning☆34Updated 9 months ago
- Official implementation of the paper: "A deeper look at depth pruning of LLMs"☆15Updated 11 months ago
- This repository contains the code and data for the paper "VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception o…☆23Updated 2 months ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Updated last year
- [NeurIPS 2023] Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning☆31Updated 2 years ago
- Code for paper "Merging Multi-Task Models via Weight-Ensembling Mixture of Experts"☆24Updated last year
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆29Updated 7 months ago
- ☆32Updated 5 months ago
- ☆28Updated 3 months ago
- Codes for Merging Large Language Models☆32Updated 10 months ago
- Code for "C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing"☆15Updated 2 months ago
- ☆25Updated last year
- ☆28Updated 7 months ago
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆40Updated 8 months ago
- The official GitHub page for paper "NegativePrompt: Leveraging Psychology for Large Language Models Enhancement via Negative Emotional St…☆22Updated last year
- This repo is based on https://github.com/jiaweizzhao/GaLore☆28Updated 9 months ago
- Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting☆32Updated last year
- Efficient Scaling laws and collaborative pretraining.☆16Updated 4 months ago
- ☆27Updated 2 years ago
- ☆20Updated 11 months ago