A repository aimed at pruning DeepSeek V3, R1 and R1-zero to a usable size
☆84Sep 5, 2025Updated 6 months ago
Alternatives and similar repositories for moe-pruner
Users that are interested in moe-pruner are comparing it to the libraries listed below
Sorting:
- Direct Preference Optimization for RWKV, aiming for RWKV-5 and 6.☆11Mar 1, 2024Updated 2 years ago
- ☆13Dec 21, 2024Updated last year
- ROSA-Tuning☆70Feb 4, 2026Updated last month
- ☆41Apr 30, 2025Updated 10 months ago
- ☆17Jan 1, 2025Updated last year
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆37Oct 9, 2025Updated 5 months ago
- Mini Model Daemon☆12Nov 9, 2024Updated last year
- A toy text-to-image model trained from scratch.☆19Jun 9, 2025Updated 9 months ago
- ☆28Aug 27, 2025Updated 6 months ago
- continous batching and parallel acceleration for RWKV6☆22Jun 28, 2024Updated last year
- Official PyTorch implementation of CD-MOE☆12Mar 13, 2026Updated last week
- Telegram bot which can work with both openAI and LocalAI modes, it also uses UncensoredGPT models like Wizard-Uncensored. It can be launc…☆20Mar 14, 2025Updated last year
- ☆16Nov 23, 2023Updated 2 years ago
- RWKV centralised docs for the community☆32Jan 17, 2026Updated 2 months ago
- A 20M RWKV v6 can do nonogram☆14Oct 18, 2024Updated last year
- MiSS is a novel PEFT method that features a low-rank structure but introduces a new update mechanism distinct from LoRA, achieving an exc…☆32Mar 9, 2026Updated last week
- RWKV v5,v6 LoRA Trainer on Cuda and Rocm Platform. RWKV is a RNN with transformer-level LLM performance. It can be directly trained like …☆13Mar 24, 2024Updated last year
- [NAACL 2025] Representing Rule-based Chatbots with Transformers☆23Feb 9, 2025Updated last year
- build llama inference compute from scrath, only using torch/numpy base ops☆12Aug 1, 2025Updated 7 months ago
- ☆26Apr 14, 2025Updated 11 months ago
- Language modeling with linear-cost context☆117Sep 25, 2025Updated 5 months ago
- Demonstration of a factory pattern where the types automatically register themselves☆13Mar 13, 2019Updated 7 years ago
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆242Jun 15, 2025Updated 9 months ago
- ☆18Sep 29, 2024Updated last year
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆263Apr 23, 2024Updated last year
- FinMTEB: Finance Massive Text Embedding Benchmark (EMNLP 2025 Main)☆53Nov 15, 2025Updated 4 months ago
- Course Project for COMP4471 on RWKV☆17Feb 11, 2024Updated 2 years ago
- Official repository for ICML 2024 paper "MoRe Fine-Tuning with 10x Fewer Parameters"☆22Oct 14, 2025Updated 5 months ago
- ☆17Feb 6, 2025Updated last year
- Simple GRPO scripts and configurations.☆59Feb 6, 2025Updated last year
- A program that allows you to chat on VRChat using ChatGPT.☆15Mar 22, 2023Updated 2 years ago
- Implementation for the paper: CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference☆35Mar 6, 2025Updated last year
- ☆28Oct 7, 2025Updated 5 months ago
- ☆12Jun 2, 2025Updated 9 months ago
- Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton☆48Aug 22, 2025Updated 6 months ago
- The WorldRWKV project aims to implement training and inference across various modalities using the RWKV7 architecture. By leveraging diff…☆66Updated this week
- A benchmark of programming tasks for LLMs that supports almost any programming language.☆13Jun 30, 2025Updated 8 months ago
- ☆17Mar 28, 2025Updated 11 months ago
- This is a repository of Binary General Matrix Multiply (BGEMM) by customized CUDA kernel. Thank FP6-LLM for the wheels!☆18Aug 30, 2024Updated last year