qiuzh20 / EMoEView external linksLinks
Official PyTorch Implementation of EMoE: Unlocking Emergent Modularity in Large Language Models [main conference @ NAACL2024]
☆39May 28, 2024Updated last year
Alternatives and similar repositories for EMoE
Users that are interested in EMoE are comparing it to the libraries listed below
Sorting:
- [ACL 2023 Findings] Emergent Modularity in Pre-trained Transformers☆26Jun 7, 2023Updated 2 years ago
- Official implementation of RMoE (Layerwise Recurrent Router for Mixture-of-Experts)☆29Aug 4, 2024Updated last year
- Research Artifact For Our Submission To VLDB☆10Oct 27, 2021Updated 4 years ago
- [NeurIPS 2024] Efficiency for Free: Ideal Data Are Transportable Representations☆19Jan 19, 2025Updated last year
- A comprehensive overview of Data Distillation and Condensation (DDC). DDC is a data-centric task where a representative (i.e., small but …☆13Dec 1, 2022Updated 3 years ago
- An implementation of online data mixing for the Pile dataset, based on the GPT-NeoX library.☆13Jan 9, 2024Updated 2 years ago
- Source code for a LoRA-based continual relation extraction method.☆14Sep 25, 2023Updated 2 years ago
- ☆143Jul 21, 2024Updated last year
- ☆15Oct 30, 2021Updated 4 years ago
- Repository for Sparse Universal Transformers☆20Oct 23, 2023Updated 2 years ago
- Instruct-tuning LLaMA on consumer hardware with machine-translated data☆19Apr 17, 2023Updated 2 years ago
- [ACL2023] Source code for Decouple knowledge from paramters for plug-and-play language modeling☆20Sep 18, 2023Updated 2 years ago
- ☆20May 28, 2025Updated 8 months ago
- Code release for Dataless Knowledge Fusion by Merging Weights of Language Models (https://openreview.net/forum?id=FCnohuR6AnM)☆93Jul 25, 2023Updated 2 years ago
- Layerwise Batch Entropy Regularization☆24Aug 3, 2022Updated 3 years ago
- [TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"☆20Sep 15, 2023Updated 2 years ago
- ☆19Oct 31, 2022Updated 3 years ago
- Beyond KV Caching: Shared Attention for Efficient LLMs☆20Jul 19, 2024Updated last year
- [ICCV 2023 Oral] Official PyTorch implementation of our paper for semi-supervised continual learning "A soft nearest-neighbor framework f…☆25Dec 17, 2024Updated last year
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models☆60Feb 7, 2025Updated last year
- Code for "Training Neural Networks with Fixed Sparse Masks" (NeurIPS 2021).☆59Jan 14, 2022Updated 4 years ago
- General system research material (not limited to paper) reading notes.☆22Mar 17, 2021Updated 4 years ago
- (NeurIPS 2024) What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights☆28Oct 28, 2024Updated last year
- Must-read papers and blogs about parametric knowledge mechanism in LLMs.☆34May 9, 2025Updated 9 months ago
- A library for parameter-efficient and composable transfer learning for NLP with sparse fine-tunings.☆75Aug 9, 2024Updated last year
- [NeurIPS 2023] Github repository for "Composing Parameter-Efficient Modules with Arithmetic Operations"☆61Nov 26, 2023Updated 2 years ago
- Source code of paper ''KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing''☆31Oct 24, 2024Updated last year
- ☆32Oct 30, 2023Updated 2 years ago
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆32Nov 4, 2024Updated last year
- Adding new tasks to T0 without catastrophic forgetting☆33Oct 20, 2022Updated 3 years ago
- ☆273Oct 31, 2023Updated 2 years ago
- PyTorch codes for the paper "An Empirical Study of Multimodal Model Merging"☆37Oct 11, 2023Updated 2 years ago
- ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward exp…☆226Sep 18, 2025Updated 4 months ago
- A Data-Driven Approach to Predict the Success of Bank Telemarketing☆10Apr 27, 2021Updated 4 years ago
- This repository is the official implementation of Topology-Informed Graph Transformer (Choi et al., GRaM Workshop at ICML 2024).☆12Dec 28, 2024Updated last year
- ☆91Aug 18, 2024Updated last year
- Data preparation code for Amber 7B LLM☆94May 10, 2024Updated last year
- ☆10Jul 16, 2023Updated 2 years ago
- various tools to download, convert and process the full text of scientific articles☆10Apr 2, 2024Updated last year