woct0rdho / transformers-qwen3-moe-fusedView external linksLinks
Fused Qwen3 MoE layer for faster training, compatible with Transformers, LoRA, bnb 4-bit quant, Unsloth. Also possible to train LoRA over GGUF
☆235Feb 5, 2026Updated last week
Alternatives and similar repositories for transformers-qwen3-moe-fused
Users that are interested in transformers-qwen3-moe-fused are comparing it to the libraries listed below
Sorting:
- Auto Thinking Mode switch for Qwen3 in Open webui☆70May 8, 2025Updated 9 months ago
- Youtu-Parsing: Perception, Structuring and Recognition via High-Parallelism Decoding☆39Feb 10, 2026Updated last week
- Benchmarking Deepseek R1 API response speeds across different providers for performance comparison.☆10Feb 15, 2025Updated last year
- ✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM☆11Jun 16, 2025Updated 8 months ago
- Various test models in WNNX format. It can view with `pip install wnetron && wnetron`☆12Jun 22, 2022Updated 3 years ago
- ☆26Oct 16, 2025Updated 4 months ago
- Repository for Skill Set Optimization☆14Jul 26, 2024Updated last year
- An abstraction library for building domain-specific intelligent agents based on Large Language Models (LLMs). LLMAgent provides a core ar…☆28Feb 5, 2026Updated last week
- EfficientSAM + YOLO World base model for use with Autodistill.☆10Feb 21, 2024Updated last year
- A lightweight operating system abstraction layer for agents.☆17Dec 26, 2025Updated last month
- ☆13Dec 21, 2024Updated last year
- Muon fsdp 2☆53Aug 8, 2025Updated 6 months ago
- The official repo for the DanQing dataset.☆29Jan 16, 2026Updated last month
- ☆33May 12, 2023Updated 2 years ago
- A userspace filesystem backing by Apache OpenDAL.☆34Jan 8, 2026Updated last month
- ☆13Apr 15, 2024Updated last year
- Inference RWKV with multiple supported backends.☆79Updated this week
- Model Context Protocol Server for Apache OpenDAL™☆34Apr 10, 2025Updated 10 months ago
- A repository aimed at pruning DeepSeek V3, R1 and R1-zero to a usable size☆83Sep 5, 2025Updated 5 months ago
- patches for huggingface transformers to save memory☆34Jun 2, 2025Updated 8 months ago
- 基于RWKV模型的角色扮演,实际上是个改的妈都不认识的 RWKV_Role_Playing☆17Aug 17, 2023Updated 2 years ago
- ☆17Apr 10, 2024Updated last year
- A lightweight Python-based GPU architecture simulator that demonstrates how parallel threads, registers, memory, and instructions work on…☆42Jan 18, 2026Updated last month
- DynASM is a Dynamic Assembler for code generation engines.☆15Jan 12, 2015Updated 11 years ago
- LoRA-One: One-Step Full Gradient Could Suffice for Fine-Tuning Large Language Models, Provably and Efficiently (ICML2025 Oral)☆28Oct 22, 2025Updated 3 months ago
- Code & Data for our Paper "RobustGEC: Robust Grammatical Error Correction Against Subtle Context Perturbation" (EMNLP 2023)☆17Jan 23, 2024Updated 2 years ago
- coze api to openai☆15Sep 1, 2024Updated last year
- ☆16Apr 11, 2024Updated last year
- 中国开发者活动日程(关注点:开源、开发者、云原生)☆23Jan 30, 2026Updated 2 weeks ago
- ☆17Sep 29, 2024Updated last year
- DPO, but faster 🚀☆47Dec 6, 2024Updated last year
- IntructIR, a novel benchmark specifically designed to evaluate the instruction following ability in information retrieval models. Our foc…☆32Jun 13, 2024Updated last year
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆235Jun 15, 2025Updated 8 months ago
- ☆63Jul 10, 2025Updated 7 months ago
- The codebase for DBSim☆16Mar 8, 2023Updated 2 years ago
- The minimal, ad-hoc way of plug and play NebulaGraph with pip install, even inside Colab Notebook!☆20May 24, 2024Updated last year
- 我陈平安,唯有一键,可搬山,倒海,降妖,镇魔,敕神,摘星,断江,摧城,开天!☆22Jun 4, 2022Updated 3 years ago
- ☆17Jan 1, 2025Updated last year
- Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"☆75May 20, 2025Updated 8 months ago