gabrielolympie / moe-prunerLinks

A repository aimed at pruning DeepSeek V3, R1 and R1-zero to a usable size

☆78

Alternatives and similar repositories for moe-pruner

Users that are interested in moe-pruner are comparing it to the libraries listed below

Sorting:

woct0rdho / transformers-qwen3-moe-fused
Fused Qwen3 MoE layer for faster training, compatible with HF Transformers, LoRA, 4-bit quant, Unsloth
☆208Updated 2 weeks ago
chu-tianxiang / QuIP-for-all
QuIP quantization
☆61Updated last year
tiiuae / onebitllms
Lightweight toolkit package to train and fine-tune 1.58bit Language models
☆98Updated 6 months ago
wdlctc / headinfer
☆60Updated 6 months ago
18907305772 / FuseAI
FuseAI Project
☆87Updated 9 months ago
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆100Updated last year
NickL77 / BaldEagle
3x Faster Inference; Unofficial implementation of EAGLE Speculative Decoding
☆78Updated 4 months ago
kyleliang919 / Super_Muon
☆65Updated 8 months ago
Zyphra / Zyda_processing
☆39Updated last year
sebulo / LoQT
☆81Updated last year
SalesforceAIResearch / GemFilter
☆85Updated last week
golololologol / LLM-Distillery
A pipeline for LLM knowledge distillation
☆110Updated 7 months ago
frankxwang / dpo-prefix-sharing
DPO, but faster 🚀
☆46Updated 11 months ago
JT-Ushio / MHA2MLA
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
☆194Updated last month
snu-mllab / KVzip
[NeurIPS'25 Oral] Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)
☆151Updated this week
nyunAI / PruneGPT
☆51Updated last year
wuhy68 / Parameter-Efficient-MoE
Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)
☆147Updated last year
nanowell / Q-Sparse-LLM
My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
☆33Updated last year
OpenMOSE / RWKV-Infer
A large-scale RWKV v7(World, PRWKV, Hybrid-RWKV) inference. Capable of inference by combining multiple states(Pseudo MoE). Easy to deploy…
☆45Updated last month
IsaacRe / vllm-kvcompress
KV cache compression for high-throughput LLM inference
☆143Updated 9 months ago
SkyworkAI / MindLink
☆98Updated 3 months ago
yynil / RWKVInside
☆39Updated 6 months ago
DeepAuto-AI / hip-attention
Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.
☆148Updated 2 weeks ago
Digitous / LLM-SLERP-Merge
Spherical Merge Pytorch/HF format Language Models with minimal feature loss.
☆140Updated 2 years ago
bigai-nlco / TokenSwift
[ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation
☆118Updated 6 months ago
thepowerfuldeez / OLMo
My fork os allen AI's OLMo for educational purposes.
☆30Updated 11 months ago
VITA-Group / Q-GaLore
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
☆202Updated last year
yynil / RWKVinLLAMA
☆17Updated 10 months ago
OpenGVLab / EfficientQAT
[ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
☆310Updated 6 months ago
Efficient-ML / Qwen3-Quantization
☆62Updated 2 months ago