Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)
☆145Sep 20, 2024Updated last year
Alternatives and similar repositories for Parameter-Efficient-MoE
Users that are interested in Parameter-Efficient-MoE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31May 22, 2024Updated last year
- 5X faster 60% less memory QLoRA finetuning☆21May 28, 2024Updated last year
- ☆274Oct 31, 2023Updated 2 years ago
- Repository for CPU Kernel Generation for LLM Inference☆28Jul 13, 2023Updated 2 years ago
- LLM-Training-API: Including Embeddings & ReRankers, mergekit, LaserRMT☆27Feb 18, 2024Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- This is our own implementation of 'Layer Selective Rank Reduction'☆241May 26, 2024Updated last year
- FuseAI Project☆592Jan 25, 2025Updated last year
- ☆129Jan 22, 2024Updated 2 years ago
- [COLM 2024] LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition☆670Jul 22, 2024Updated last year
- [ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models☆116May 24, 2024Updated last year
- ⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)☆1,000Dec 6, 2024Updated last year
- ☆138Aug 19, 2024Updated last year
- Codebase for Merging Language Models (ICML 2024)☆863May 5, 2024Updated last year
- FuseAI Project☆93Jan 25, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- ☆17May 2, 2024Updated last year
- Tools for merging pretrained large language models.☆6,945Mar 15, 2026Updated 3 weeks ago
- Load multiple LoRA modules simultaneously and automatically switch the appropriate combination of LoRA modules to generate the best answe…☆160Feb 9, 2024Updated 2 years ago
- Code for Zero-Shot Tokenizer Transfer☆144Jan 14, 2025Updated last year
- ☆177Jul 22, 2024Updated last year
- Mixture of Expert (MoE) techniques for enhancing LLM performance through expert-driven prompt mapping and adapter combinations.☆12Feb 11, 2024Updated 2 years ago
- [SIGIR'24] The official implementation code of MOELoRA.☆192Jul 22, 2024Updated last year
- [ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning☆643Mar 4, 2024Updated 2 years ago
- Synthetic Alphabet Dataset☆19Mar 27, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Official PyTorch implementation of QA-LoRA☆145Mar 13, 2024Updated 2 years ago
- Official implementation of Half-Quadratic Quantization (HQQ)☆925Feb 26, 2026Updated last month
- State-of-the-art Parameter-Efficient MoE Fine-tuning Method☆203Aug 22, 2024Updated last year
- A bagel, with everything.☆326Apr 11, 2024Updated last year
- 삼각형의 실전! Triton☆16Feb 15, 2024Updated 2 years ago
- ☆13Feb 18, 2024Updated 2 years ago
- Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updates☆474Apr 21, 2024Updated last year
- [NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baich…☆1,114Oct 7, 2024Updated last year
- A toolkit for inference and evaluation of 'mixtral-8x7b-32kseqlen' from Mistral AI☆772Dec 15, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆101Sep 30, 2024Updated last year
- For releasing code related to compression methods for transformers, accompanying our publications☆458Jan 16, 2025Updated last year
- Token Omission Via Attention☆127Oct 13, 2024Updated last year
- Generate interleaved text and image content in a structured format you can directly pass to downstream APIs.☆29Oct 18, 2024Updated last year
- [ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"☆450Oct 16, 2024Updated last year
- ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward exp…☆226Sep 18, 2025Updated 6 months ago
- Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]☆592Dec 9, 2024Updated last year