A repository aimed at pruning DeepSeek V3, R1 and R1-zero to a usable size
☆88Sep 5, 2025Updated 9 months ago
Alternatives and similar repositories for moe-pruner
Users that are interested in moe-pruner are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A simple Fast API Backend for Ironclad/rivet☆26Jan 9, 2024Updated 2 years ago
- Direct Preference Optimization for RWKV, aiming for RWKV-5 and 6.☆11Mar 1, 2024Updated 2 years ago
- ☆12Dec 21, 2024Updated last year
- Repo for the NFL analysis project.☆13Feb 20, 2024Updated 2 years ago
- [ICLR25] STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs☆20Jun 3, 2025Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ROSA-Tuning☆74Feb 4, 2026Updated 4 months ago
- ☆41Apr 30, 2025Updated last year
- ☆17Jan 1, 2025Updated last year
- Mini Model Daemon☆13Nov 9, 2024Updated last year
- D^2-MoE: Delta Decompression for MoE-based LLMs Compression☆82Mar 25, 2025Updated last year
- ☆29Aug 27, 2025Updated 9 months ago
- continous batching and parallel acceleration for RWKV6☆22Jun 28, 2024Updated last year
- A toy text-to-image model trained from scratch.☆19Jun 9, 2025Updated last year
- Official PyTorch implementation of CD-MOE☆12Mar 18, 2026Updated 2 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- HFODetector is Python package that that is capable of detecting HFOs with STE / MNI / Hilbert detector. Detection speed is increased by u…☆13Feb 16, 2025Updated last year
- [NeurIPS'22] What Makes a "Good" Data Augmentation in Knowledge Distillation -- A Statistical Perspective☆37Dec 15, 2022Updated 3 years ago
- RWKV centralised docs for the community☆34Jan 17, 2026Updated 4 months ago
- A 20M RWKV v6 can do nonogram☆13Oct 18, 2024Updated last year
- Official Chinese documentation for RWKV | RWKV官方中文文档☆15May 20, 2026Updated 3 weeks ago
- MiSS is a novel PEFT method that features a low-rank structure but introduces a new update mechanism distinct from LoRA, achieving an exc…☆35Mar 9, 2026Updated 3 months ago
- Lottery Ticket Adaptation☆40Nov 20, 2024Updated last year
- RWKV v5,v6 LoRA Trainer on Cuda and Rocm Platform. RWKV is a RNN with transformer-level LLM performance. It can be directly trained like …☆13Mar 24, 2024Updated 2 years ago
- [NAACL 2025] Representing Rule-based Chatbots with Transformers☆23Feb 9, 2025Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- The application of large pre-trained vision model DINOv2 from MetaAI for feature points matching, and a ViT decoder used for Auto Encoder☆18Apr 27, 2023Updated 3 years ago
- Demonstration of a factory pattern where the types automatically register themselves☆13Mar 13, 2019Updated 7 years ago
- ☆19Sep 29, 2024Updated last year
- build llama inference compute from scrath, only using torch/numpy base ops☆16May 5, 2026Updated last month
- FinMTEB: Finance Massive Text Embedding Benchmark (EMNLP 2025 Main)☆55Nov 15, 2025Updated 6 months ago
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆251Jun 15, 2025Updated 11 months ago
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆267Apr 23, 2024Updated 2 years ago
- ☆27Apr 14, 2025Updated last year
- Course Project for COMP4471 on RWKV☆17Feb 11, 2024Updated 2 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Official repository for ICML 2024 paper "MoRe Fine-Tuning with 10x Fewer Parameters"☆22Oct 14, 2025Updated 8 months ago
- Simple GRPO scripts and configurations.☆59Feb 6, 2025Updated last year
- A program that allows you to chat on VRChat using ChatGPT.☆15Mar 22, 2023Updated 3 years ago
- langchain opentutorial utility package for Open Tutorial☆10Feb 2, 2025Updated last year
- ☆12Jun 2, 2025Updated last year
- Various LLM Benchmarks☆26Feb 20, 2026Updated 3 months ago
- The WorldRWKV project aims to implement training and inference across various modalities using the RWKV7 architecture. By leveraging diff…☆68Mar 18, 2026Updated 2 months ago