☆274Oct 31, 2023Updated 2 years ago
Alternatives and similar repositories for parameter-efficient-moe
Users that are interested in parameter-efficient-moe are comparing it to the libraries listed below
Sorting:
- [SIGIR'24] The official implementation code of MOELoRA.☆188Jul 22, 2024Updated last year
- ☆176Jul 22, 2024Updated last year
- ⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)☆1,001Dec 6, 2024Updated last year
- LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment☆401Apr 29, 2024Updated last year
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)☆145Sep 20, 2024Updated last year
- Mixture of Expert (MoE) techniques for enhancing LLM performance through expert-driven prompt mapping and adapter combinations.☆12Feb 11, 2024Updated 2 years ago
- A family of open-sourced Mixture-of-Experts (MoE) Large Language Models☆1,660Mar 8, 2024Updated last year
- ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward exp…☆226Sep 18, 2025Updated 5 months ago
- ☆415Nov 2, 2023Updated 2 years ago
- batched loras☆350Sep 6, 2023Updated 2 years ago
- LongQLoRA: Extent Context Length of LLMs Efficiently☆168Nov 12, 2023Updated 2 years ago
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆279Nov 3, 2023Updated 2 years ago
- [ICLR‘24 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"☆102Jun 20, 2025Updated 8 months ago
- The official repo for "LLoCo: Learning Long Contexts Offline"☆118Jun 15, 2024Updated last year
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆45Oct 1, 2025Updated 5 months ago
- Repository for Skill Set Optimization☆14Jul 26, 2024Updated last year
- ☆37Oct 10, 2024Updated last year
- [COLM 2024] LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition☆669Jul 22, 2024Updated last year
- Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]☆588Dec 9, 2024Updated last year
- Official repository of NEFTune: Noisy Embeddings Improves Instruction Finetuning☆409May 17, 2024Updated last year
- Codebase for Merging Language Models (ICML 2024)☆863May 5, 2024Updated last year
- Serving multiple LoRA finetuned LLM as one☆1,145May 8, 2024Updated last year
- LongLLaMA is a large language model capable of handling long contexts. It is based on OpenLLaMA and fine-tuned with the Focused Transform…☆1,463Nov 7, 2023Updated 2 years ago
- A collection of AWESOME things about mixture-of-experts☆1,266Dec 8, 2024Updated last year
- Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)☆2,693Aug 14, 2024Updated last year
- This repository contains the code for the paper: SirLLM: Streaming Infinite Retentive LLM☆60May 28, 2024Updated last year
- Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updates☆473Apr 21, 2024Updated last year
- ☆96Oct 8, 2023Updated 2 years ago
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆209May 20, 2024Updated last year
- ☆30Sep 28, 2023Updated 2 years ago
- [TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"☆20Sep 15, 2023Updated 2 years ago
- Implementation of DoRA☆308Jun 7, 2024Updated last year
- This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).☆114May 2, 2022Updated 3 years ago
- S-LoRA: Serving Thousands of Concurrent LoRA Adapters☆1,899Jan 21, 2024Updated 2 years ago
- ☆29Jan 23, 2024Updated 2 years ago
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆91Feb 27, 2024Updated 2 years ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆56Feb 28, 2023Updated 3 years ago
- Code for paper titled "Towards the Law of Capacity Gap in Distilling Language Models"☆102Jul 9, 2024Updated last year
- ☆301Jul 10, 2025Updated 7 months ago