woct0rdho / transformers-qwen3-moe-fusedLinks
Fused Qwen3 MoE layer for faster training, compatible with HF Transformers, LoRA, 4-bit quant, Unsloth
☆223Updated this week
Alternatives and similar repositories for transformers-qwen3-moe-fused
Users that are interested in transformers-qwen3-moe-fused are comparing it to the libraries listed below
Sorting:
- A repository aimed at pruning DeepSeek V3, R1 and R1-zero to a usable size☆82Updated 4 months ago
- [ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation☆120Updated 7 months ago
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆106Updated 7 months ago
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆198Updated last month
- ☆93Updated 7 months ago
- ☆65Updated 9 months ago
- ☆63Updated 7 months ago
- A collection of tricks and tools to speed up transformer models☆193Updated 3 weeks ago
- RWKV-7: Surpassing GPT☆103Updated last year
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆138Updated last year
- A pipeline for LLM knowledge distillation☆112Updated 9 months ago
- Cookbook of SGLang - Recipe☆53Updated this week
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆226Updated 2 months ago
- Nano repo for RL training of LLMs☆70Updated 2 months ago
- ☆84Updated 9 months ago
- Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling☆467Updated 7 months ago
- CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge tec…☆216Updated 3 months ago
- Tina: Tiny Reasoning Models via LoRA☆314Updated 3 months ago
- ☆99Updated 5 months ago
- Block Diffusion for Ultra-Fast Speculative Decoding☆313Updated last week
- ☆508Updated 3 weeks ago
- [EMNLP 2025] The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"☆102Updated 4 months ago
- Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI.☆251Updated 3 months ago
- [NeurIPS'25 Oral] Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)☆183Updated 2 weeks ago
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆222Updated 5 months ago
- LongRoPE is a novel method that can extends the context window of pre-trained LLMs to an impressive 2048k tokens.☆276Updated 2 months ago
- [NeurIPS 2025] The official repo of SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond☆187Updated 6 months ago
- [ICML 2025] From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories and Applications☆52Updated 2 months ago
- DPO, but faster 🚀☆46Updated last year
- MrlX: A Multi-Agent Reinforcement Learning Framework☆161Updated last month