wuhy68 / Parameter-Efficient-MoEView external linksLinks
Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)
☆146Sep 20, 2024Updated last year
Alternatives and similar repositories for Parameter-Efficient-MoE
Users that are interested in Parameter-Efficient-MoE are comparing it to the libraries listed below
Sorting:
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31May 22, 2024Updated last year
- 5X faster 60% less memory QLoRA finetuning☆21May 28, 2024Updated last year
- ☆273Oct 31, 2023Updated 2 years ago
- Repository for CPU Kernel Generation for LLM Inference☆28Jul 13, 2023Updated 2 years ago
- LLM-Training-API: Including Embeddings & ReRankers, mergekit, LaserRMT☆27Feb 18, 2024Updated last year
- FuseAI Project☆587Jan 25, 2025Updated last year
- ☆129Jan 22, 2024Updated 2 years ago
- [COLM 2024] LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition☆668Jul 22, 2024Updated last year
- This is our own implementation of 'Layer Selective Rank Reduction'☆240May 26, 2024Updated last year
- [ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models☆114May 24, 2024Updated last year
- ☆137Aug 19, 2024Updated last year
- ⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)☆1,003Dec 6, 2024Updated last year
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆38Jun 11, 2025Updated 8 months ago
- Codebase for Merging Language Models (ICML 2024)☆864May 5, 2024Updated last year
- A library for easily merging multiple LLM experts, and efficiently train the merged LLM.☆507Aug 26, 2024Updated last year
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆102Sep 30, 2024Updated last year
- Load multiple LoRA modules simultaneously and automatically switch the appropriate combination of LoRA modules to generate the best answe…☆159Feb 9, 2024Updated 2 years ago
- Code for Zero-Shot Tokenizer Transfer☆142Jan 14, 2025Updated last year
- Token Omission Via Attention☆128Oct 13, 2024Updated last year
- Tools for merging pretrained large language models.☆6,783Jan 26, 2026Updated 2 weeks ago
- FuseAI Project☆87Jan 25, 2025Updated last year
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆91Feb 27, 2024Updated last year
- MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models☆454Feb 1, 2024Updated 2 years ago
- Official PyTorch implementation of QA-LoRA☆145Mar 13, 2024Updated last year
- Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).☆25Jul 15, 2025Updated 7 months ago
- ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward exp…☆226Sep 18, 2025Updated 4 months ago
- CodeUltraFeedback: aligning large language models to coding preferences (TOSEM 2025)☆73Jun 25, 2024Updated last year
- A toolkit for inference and evaluation of 'mixtral-8x7b-32kseqlen' from Mistral AI☆771Dec 15, 2023Updated 2 years ago
- ☆176Jul 22, 2024Updated last year
- [ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.☆887Nov 26, 2025Updated 2 months ago
- A bagel, with everything.☆326Apr 11, 2024Updated last year
- [ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache☆356Nov 20, 2025Updated 2 months ago
- [ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning☆640Mar 4, 2024Updated last year
- Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updates☆473Apr 21, 2024Updated last year
- ☆130Oct 1, 2024Updated last year
- Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]☆588Dec 9, 2024Updated last year
- Optimizing bit-level Jaccard Index and Population Counts for large-scale quantized Vector Search via Harley-Seal CSA and Lookup Tables☆21May 18, 2025Updated 8 months ago
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆262Apr 23, 2024Updated last year
- code for Scaling Laws of RoPE-based Extrapolation☆73Oct 16, 2023Updated 2 years ago