Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)
☆146Sep 20, 2024Updated last year
Alternatives and similar repositories for Parameter-Efficient-MoE
Users that are interested in Parameter-Efficient-MoE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31May 22, 2024Updated 2 years ago
- 5X faster 60% less memory QLoRA finetuning☆21May 28, 2024Updated 2 years ago
- ☆276Oct 31, 2023Updated 2 years ago
- Repository for CPU Kernel Generation for LLM Inference☆28Jul 13, 2023Updated 2 years ago
- LLM-Training-API: Including Embeddings & ReRankers, mergekit, LaserRMT☆27Feb 18, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- This is our own implementation of 'Layer Selective Rank Reduction'☆240May 26, 2024Updated 2 years ago
- FuseAI Project☆601Jan 25, 2025Updated last year
- ☆129Jan 22, 2024Updated 2 years ago
- [COLM 2024] LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition☆671Jul 22, 2024Updated last year
- [ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models☆120May 24, 2024Updated 2 years ago
- A library for easily merging multiple LLM experts, and efficiently train the merged LLM.☆517Aug 26, 2024Updated last year
- ⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)☆1,004Dec 6, 2024Updated last year
- ☆137Aug 19, 2024Updated last year
- Codebase for Merging Language Models (ICML 2024)☆869May 5, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- FuseAI Project☆93Jan 25, 2025Updated last year
- ☆17May 2, 2024Updated 2 years ago
- Tools for merging pretrained large language models.☆7,190Jun 17, 2026Updated last week
- Load multiple LoRA modules simultaneously and automatically switch the appropriate combination of LoRA modules to generate the best answe…☆161Feb 9, 2024Updated 2 years ago
- Code for Zero-Shot Tokenizer Transfer☆145Jan 14, 2025Updated last year
- ☆179Jul 22, 2024Updated last year
- Mixture of Expert (MoE) techniques for enhancing LLM performance through expert-driven prompt mapping and adapter combinations.☆12Feb 11, 2024Updated 2 years ago
- [SIGIR'24] The official implementation code of MOELoRA.☆193Jul 22, 2024Updated last year
- [ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning☆641Mar 4, 2024Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Synthetic Alphabet Dataset☆19Mar 27, 2025Updated last year
- Official PyTorch implementation of QA-LoRA☆147Mar 13, 2024Updated 2 years ago
- A toolkit for inference and evaluation of 'mixtral-8x7b-32kseqlen' from Mistral AI☆773Dec 15, 2023Updated 2 years ago
- State-of-the-art Parameter-Efficient MoE Fine-tuning Method☆207Aug 22, 2024Updated last year
- Official implementation of Half-Quadratic Quantization (HQQ)☆945Feb 26, 2026Updated 4 months ago
- A bagel, with everything.☆326Apr 11, 2024Updated 2 years ago
- [ACL 2025 🔥] Time Travel is a Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts☆19May 22, 2025Updated last year
- ☆13Feb 18, 2024Updated 2 years ago
- Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updates☆474Apr 21, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baich…☆1,131Oct 7, 2024Updated last year
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆101Sep 30, 2024Updated last year
- Token Omission Via Attention☆130Oct 13, 2024Updated last year
- For releasing code related to compression methods for transformers, accompanying our publications☆462Jan 16, 2025Updated last year
- Generate interleaved text and image content in a structured format you can directly pass to downstream APIs.☆29Oct 18, 2024Updated last year
- [ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"☆450Oct 16, 2024Updated last year
- ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward exp…☆225Sep 18, 2025Updated 9 months ago