woct0rdho / transformers-qwen3-moe-fusedLinks
Fused Qwen3 MoE layer, compatible with HF Transformers, LoRA, 4-bit quant, Unsloth
☆107Updated this week
Alternatives and similar repositories for transformers-qwen3-moe-fused
Users that are interested in transformers-qwen3-moe-fused are comparing it to the libraries listed below
Sorting:
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆81Updated last month
- Auto Thinking Mode switch for Qwen3 in Open webui☆66Updated 2 months ago
- ☆142Updated 7 months ago
- Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)☆89Updated last week
- ☆52Updated last month
- ☆101Updated 10 months ago
- CursorCore: Assist Programming through Aligning Anything☆125Updated 4 months ago
- ☆277Updated last month
- ☆59Updated 3 months ago
- XVERSE-MoE-A36B: A multilingual large language model developed by XVERSE Technology Inc.☆39Updated 10 months ago
- ☆48Updated 5 months ago
- CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge tec…☆159Updated last week
- ☆53Updated last year
- Implemented a script that automatically adjusts Qwen3's inference and non-inference capabilities, based on an OpenAI-like API. The infere…☆20Updated 2 months ago
- Try out HallOumi, a state-of-the-art claim verification model in a simple UI!☆36Updated 3 months ago
- A collection of tricks and tools to speed up transformer models☆170Updated last month
- RWKV-7: Surpassing GPT☆92Updated 7 months ago
- Deep Reasoning Translation (DRT) Project☆225Updated last month
- Python Implementation of MUVERA (Multi-Vector Retrieval via Fixed Dimensional Encodings)☆68Updated this week
- ☆19Updated 4 months ago
- [ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation☆110Updated last month
- A repository aimed at pruning DeepSeek V3, R1 and R1-zero to a usable size☆61Updated 3 months ago
- Simple high-throughput inference library☆118Updated last month
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆39Updated last year
- Multi-Layer Key-Value sharing experiments on Pythia models☆33Updated last year
- Enable tool-use ability for any LLM model (DeepSeek V3/R1, etc.)☆51Updated last month
- patches for huggingface transformers to save memory☆26Updated last month
- Data preparation code for CrystalCoder 7B LLM☆45Updated last year
- GLM Series Edge Models☆144Updated last month
- Kyutai with an "eye"☆207Updated 3 months ago