woct0rdho / transformers-qwen3-moe-fusedLinks

Fused Qwen3 MoE layer for faster training, compatible with HF Transformers, LoRA, 4-bit quant, Unsloth

☆212

Alternatives and similar repositories for transformers-qwen3-moe-fused

Users that are interested in transformers-qwen3-moe-fused are comparing it to the libraries listed below

Sorting:

gabrielolympie / moe-pruner
A repository aimed at pruning DeepSeek V3, R1 and R1-zero to a usable size
☆79Updated 3 months ago
JT-Ushio / MHA2MLA
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
☆196Updated 2 months ago
bigai-nlco / TokenSwift
[ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation
☆118Updated 6 months ago
wdlctc / headinfer
☆61Updated 6 months ago
Tongyi-Zhiwen / Qwen-Doc
☆300Updated 6 months ago
Tencent / llm.hunyuan.T1
☆85Updated 8 months ago
OpenMachine-ai / transformer-tricks
A collection of tricks and tools to speed up transformer models
☆189Updated last month
kyleliang919 / Super_Muon
☆66Updated 8 months ago
SkyworkAI / Skywork-MoE
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models
☆137Updated last year
QwenLM / WorldPM
☆91Updated 6 months ago
tiiuae / onebitllms
Lightweight toolkit package to train and fine-tune 1.58bit Language models
☆100Updated 6 months ago
OpenBMB / CPM.cu
CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge tec…
☆206Updated last month
Tencent / AngelSlim
Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.
☆212Updated last week
ZihanWang314 / CoE
Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models
☆223Updated last month
GeeeekExplorer / transformers-patch
patches for huggingface transformers to save memory
☆33Updated 6 months ago
Jellyfish042 / Sudoku-RWKV
☆148Updated last year
xichen-fy / Fira
Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?
☆118Updated last year
golololologol / LLM-Distillery
A pipeline for LLM knowledge distillation
☆110Updated 8 months ago
snu-mllab / KVzip
[NeurIPS'25 Oral] Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)
☆153Updated last week
SkyworkAI / MindLink
☆98Updated 3 months ago
pangu-tech / pangu-ultra
☆73Updated 6 months ago
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆101Updated last year
Emericen / tiny-qwen
A minimal PyTorch re-implementation of Qwen3 VL with a fancy CLI
☆256Updated this week
IsaacRe / vllm-kvcompress
KV cache compression for high-throughput LLM inference
☆145Updated 10 months ago
QwenLM / ParScale
Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling
☆456Updated 6 months ago
18907305772 / FuseAI
FuseAI Project
☆87Updated 10 months ago
OpenGVLab / EfficientQAT
[ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
☆316Updated last week
fangyuan-ksgk / Tiny-GRPO
minimal GRPO implementation from scratch
☆100Updated 8 months ago
VITA-Group / WeLore
[ICML 2025] From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories and Applications
☆51Updated last month
shangshang-wang / Tina
Tina: Tiny Reasoning Models via LoRA
☆309Updated 2 months ago