wuhy68 / Parameter-Efficient-MoELinks

Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)

☆147

Alternatives and similar repositories for Parameter-Efficient-MoE

Users that are interested in Parameter-Efficient-MoE are comparing it to the libraries listed below

Sorting:

snu-mllab / Context-Memory
Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)
☆61Updated last year
SalesforceAIResearch / GemFilter
☆85Updated 9 months ago
HanGuo97 / lq-lora
☆127Updated last year
dwzhu-pku / PoSE
Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)
☆204Updated last year
jeffreysijuntan / lloco
The official repo for "LLoCo: Learning Long Contexts Offline"
☆117Updated last year
sanyalsunny111 / LLM-Inheritune
This is the official repository for Inheritune.
☆115Updated 8 months ago
Cohere-Labs-Community / parameter-efficient-moe
☆271Updated last year
Digitous / LLM-SLERP-Merge
Spherical Merge Pytorch/HF format Language Models with minimal feature loss.
☆139Updated 2 years ago
FasterDecoding / BitDelta
☆202Updated 10 months ago
nbasyl / DoRA
Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"
☆124Updated last year
astramind-ai / Mixture-of-depths
Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆174Updated last year
18907305772 / FuseAI
FuseAI Project
☆87Updated 9 months ago
Glaciohound / LM-Infinite
Implementation of NAACL 2024 Outstanding Paper "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"
☆149Updated 7 months ago
clinicalml / co-llm
Co-LLM: Learning to Decode Collaboratively with Multiple Language Models
☆122Updated last year
itsnamgyu / block-transformer
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
☆162Updated 6 months ago
TIGER-AI-Lab / LongICLBench
Code and Data for "Long-context LLMs Struggle with Long In-context Learning" [TMLR2025]
☆108Updated 8 months ago
xufangzhi / phi-Decoding
[ACL 2025] An inference-time decoding strategy with adaptive foresight sampling
☆105Updated 5 months ago
kamanphoebe / Look-into-MoEs
[NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models
☆55Updated 8 months ago
fangyuan-ksgk / Evolutionary-Model-Merge
Unofficial Implementation of Evolutionary Model Merging
☆40Updated last year
GAIR-NLP / Entropy-ABF
Official implementation for 'Extending LLMs’ Context Window with 100 Samples'
☆80Updated last year
VITA-Group / WeLore
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…
☆51Updated 6 months ago
yxli2123 / LoftQ
☆231Updated last year
r-three / phatgoose
Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"
☆90Updated last year
TIGER-AI-Lab / CritiqueFineTuning
Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate" [COLM 2025]
☆178Updated 3 months ago
nikhilgsh / loraplus
☆228Updated last year
uukuguy / multi_loras
Load multiple LoRA modules simultaneously and automatically switch the appropriate combination of LoRA modules to generate the best answe…
☆158Updated last year
dvlab-research / MR-GSM8K
Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs
☆50Updated last year
DAMO-NLP-SG / CLEX
[ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models
☆78Updated last year
JayZhang42 / SLED
SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433
☆107Updated 10 months ago
DAMO-NLP-SG / LongPO
[ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
☆41Updated 7 months ago