A repository aimed at pruning DeepSeek V3, R1 and R1-zero to a usable size
☆86Sep 5, 2025Updated 7 months ago
Alternatives and similar repositories for moe-pruner
Users that are interested in moe-pruner are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Direct Preference Optimization for RWKV, aiming for RWKV-5 and 6.☆11Mar 1, 2024Updated 2 years ago
- ☆12Dec 21, 2024Updated last year
- Official Implementation for NorMuon paper☆66Mar 11, 2026Updated last month
- ROSA-Tuning☆71Feb 4, 2026Updated 2 months ago
- [ICLR25] STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs☆20Jun 3, 2025Updated 10 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- ☆41Apr 30, 2025Updated last year
- ☆17Jan 1, 2025Updated last year
- D^2-MoE: Delta Decompression for MoE-based LLMs Compression☆82Mar 25, 2025Updated last year
- ☆28Aug 27, 2025Updated 8 months ago
- Official PyTorch implementation of CD-MOE☆12Mar 18, 2026Updated last month
- RWKV centralised docs for the community☆32Jan 17, 2026Updated 3 months ago
- A 20M RWKV v6 can do nonogram☆13Oct 18, 2024Updated last year
- Official Chinese documentation for RWKV | RWKV官方中文文档☆15Apr 16, 2026Updated 2 weeks ago
- Lottery Ticket Adaptation☆40Nov 20, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- RWKV v5,v6 LoRA Trainer on Cuda and Rocm Platform. RWKV is a RNN with transformer-level LLM performance. It can be directly trained like …☆13Mar 24, 2024Updated 2 years ago
- ☆13Mar 23, 2025Updated last year
- "Robust Attributed Graph Alignment via Joint Structure Learning and Optimal Transport" in ICDE 2023☆18Oct 23, 2023Updated 2 years ago
- [NAACL 2025] Representing Rule-based Chatbots with Transformers☆23Feb 9, 2025Updated last year
- Language modeling with linear-cost context☆118Sep 25, 2025Updated 7 months ago
- ☆19Sep 29, 2024Updated last year
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆248Jun 15, 2025Updated 10 months ago
- FinMTEB: Finance Massive Text Embedding Benchmark (EMNLP 2025 Main)☆55Nov 15, 2025Updated 5 months ago
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆268Apr 23, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆27Apr 14, 2025Updated last year
- Course Project for COMP4471 on RWKV☆17Feb 11, 2024Updated 2 years ago
- Simple GRPO scripts and configurations.☆59Feb 6, 2025Updated last year
- A program that allows you to chat on VRChat using ChatGPT.☆15Mar 22, 2023Updated 3 years ago
- Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton☆48Apr 2, 2026Updated 3 weeks ago
- The WorldRWKV project aims to implement training and inference across various modalities using the RWKV7 architecture. By leveraging diff…☆67Mar 18, 2026Updated last month
- This is a repository of Binary General Matrix Multiply (BGEMM) by customized CUDA kernel. Thank FP6-LLM for the wheels!☆20Aug 30, 2024Updated last year
- ☆28Oct 7, 2025Updated 6 months ago
- [ACL 2026 Main] Analytical FFN-to-MoE Restructuring via Activation Pattern Analysis☆38Apr 24, 2026Updated last week
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Blindspots in LLMs I've noticed while AI coding. Sonnet family emphasis.☆13Mar 20, 2025Updated last year
- rule matcher (context free grammar)☆10Dec 27, 2019Updated 6 years ago
- Control LLM☆23Apr 6, 2025Updated last year
- ☆64Oct 17, 2023Updated 2 years ago
- a within-document event coreference resolution system, trained and evaluated on the KBP corpus.☆10May 15, 2023Updated 2 years ago
- This repo is re-produce for Channel_pruning☆11May 17, 2018Updated 7 years ago
- End-To-End SpeechSynthesis system with knowledge distillation☆17Jul 16, 2022Updated 3 years ago