A repository aimed at pruning DeepSeek V3, R1 and R1-zero to a usable size
☆84Sep 5, 2025Updated 7 months ago
Alternatives and similar repositories for moe-pruner
Users that are interested in moe-pruner are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Direct Preference Optimization for RWKV, aiming for RWKV-5 and 6.☆11Mar 1, 2024Updated 2 years ago
- ☆13Dec 21, 2024Updated last year
- This is an Android App. Now with 100% less bugs.☆10Sep 26, 2019Updated 6 years ago
- Official Implementation for NorMuon paper☆65Mar 11, 2026Updated 3 weeks ago
- [ICLR25] STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs☆20Jun 3, 2025Updated 10 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆41Apr 30, 2025Updated 11 months ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆37Oct 9, 2025Updated 6 months ago
- Mini Model Daemon☆12Nov 9, 2024Updated last year
- D^2-MoE: Delta Decompression for MoE-based LLMs Compression☆78Mar 25, 2025Updated last year
- ☆28Aug 27, 2025Updated 7 months ago
- continous batching and parallel acceleration for RWKV6☆22Jun 28, 2024Updated last year
- Official PyTorch implementation of CD-MOE☆12Mar 18, 2026Updated 3 weeks ago
- [NeurIPS'22] What Makes a "Good" Data Augmentation in Knowledge Distillation -- A Statistical Perspective☆37Dec 15, 2022Updated 3 years ago
- A 20M RWKV v6 can do nonogram☆14Oct 18, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Official Chinese documentation for RWKV | RWKV官方中文文档☆15Mar 27, 2026Updated last week
- MiSS is a novel PEFT method that features a low-rank structure but introduces a new update mechanism distinct from LoRA, achieving an exc…☆35Mar 9, 2026Updated last month
- The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.☆14Mar 30, 2024Updated 2 years ago
- Lottery Ticket Adaptation☆40Nov 20, 2024Updated last year
- RWKV v5,v6 LoRA Trainer on Cuda and Rocm Platform. RWKV is a RNN with transformer-level LLM performance. It can be directly trained like …☆13Mar 24, 2024Updated 2 years ago
- https://x.com/BlinkDL_AI/status/1884768989743882276☆28May 4, 2025Updated 11 months ago
- ☆12Mar 23, 2025Updated last year
- "Robust Attributed Graph Alignment via Joint Structure Learning and Optimal Transport" in ICDE 2023☆18Oct 23, 2023Updated 2 years ago
- [NAACL 2025] Representing Rule-based Chatbots with Transformers☆23Feb 9, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Language modeling with linear-cost context☆117Sep 25, 2025Updated 6 months ago
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆248Jun 15, 2025Updated 9 months ago
- ☆21Nov 26, 2025Updated 4 months ago
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆263Apr 23, 2024Updated last year
- ☆27Apr 14, 2025Updated 11 months ago
- Course Project for COMP4471 on RWKV☆17Feb 11, 2024Updated 2 years ago
- Official repository for ICML 2024 paper "MoRe Fine-Tuning with 10x Fewer Parameters"☆22Oct 14, 2025Updated 5 months ago
- Simple GRPO scripts and configurations.☆59Feb 6, 2025Updated last year
- A program that allows you to chat on VRChat using ChatGPT.☆15Mar 22, 2023Updated 3 years ago
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton☆48Apr 2, 2026Updated last week
- The WorldRWKV project aims to implement training and inference across various modalities using the RWKV7 architecture. By leveraging diff…☆66Mar 18, 2026Updated 3 weeks ago
- A benchmark of programming tasks for LLMs that supports almost any programming language.☆13Jun 30, 2025Updated 9 months ago
- ☆17Mar 28, 2025Updated last year
- ☆28Oct 7, 2025Updated 6 months ago
- This is a repository of Binary General Matrix Multiply (BGEMM) by customized CUDA kernel. Thank FP6-LLM for the wheels!☆19Aug 30, 2024Updated last year
- Blindspots in LLMs I've noticed while AI coding. Sonnet family emphasis.☆13Mar 20, 2025Updated last year