tanganke / weight-ensembling_MoE
Code for paper "Merging Multi-Task Models via Weight-Ensembling Mixture of Experts"
☆18Updated 7 months ago
Alternatives and similar repositories for weight-ensembling_MoE:
Users that are interested in weight-ensembling_MoE are comparing it to the libraries listed below
- AdaMerging: Adaptive Model Merging for Multi-Task Learning. ICLR, 2024.☆61Updated 2 months ago
- [NeurIPS2024] Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging☆47Updated last month
- [NeurIPS 2024 Spotlight] EMR-Merging: Tuning-Free High-Performance Model Merging☆42Updated 2 months ago
- Official repository of "Localizing Task Information for Improved Model Merging and Compression" [ICML 2024]☆39Updated 2 months ago
- Code for paper "Parameter Efficient Multi-task Model Fusion with Partial Linearization"☆17Updated 4 months ago
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆38Updated 2 months ago
- Mosaic IT: Enhancing Instruction Tuning with Data Mosaics☆17Updated 6 months ago
- Codes for Merging Large Language Models☆27Updated 5 months ago
- [NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".☆42Updated last month
- [ATTRIB @ NeurIPS 2024 Oral] When Attention Sink Emerges in Language Models: An Empirical View☆43Updated 3 months ago
- Representation Surgery for Multi-Task Model Merging. ICML, 2024.☆34Updated 3 months ago
- ☆16Updated last month
- ☆10Updated 4 months ago
- ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"☆32Updated 3 months ago
- ☆30Updated last year
- Less is More: Task-aware Layer-wise Distillation for Language Model Compression (ICML2023)☆32Updated last year
- ☆31Updated 2 months ago
- ☆23Updated last month
- [ICML 2024] Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity; Lu Yin*, Ajay Jaiswal*, Shiwei Liu, So…☆16Updated 7 months ago
- ☆39Updated last year
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆48Updated last year
- ☆27Updated last year
- [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"☆53Updated 3 months ago
- [SafeGenAi @ NeurIPS 2024] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates☆67Updated 2 months ago
- A curated list of Model Merging methods.☆89Updated 4 months ago
- MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆32Updated last month
- ☆20Updated 6 months ago
- ☆27Updated 11 months ago
- [ICML 2024 Oral] Official code repository for MLLM-as-a-Judge.☆62Updated last month
- Implementation of PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆31Updated 2 months ago