gabrielolympie / moe-prunerLinks
A repository aimed at pruning DeepSeek V3, R1 and R1-zero to a usable size
☆58Updated 2 months ago
Alternatives and similar repositories for moe-pruner
Users that are interested in moe-pruner are comparing it to the libraries listed below
Sorting:
- ☆53Updated last year
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆72Updated 2 weeks ago
- QuIP quantization☆52Updated last year
- The official code repo and data hub of top_nsigma sampling strategy for LLMs.☆25Updated 3 months ago
- ☆83Updated 3 weeks ago
- A pipeline for LLM knowledge distillation☆104Updated 2 months ago
- Easy to use, High Performant Knowledge Distillation for LLMs☆85Updated last month
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆78Updated last year
- Spherical Merge Pytorch/HF format Language Models with minimal feature loss.☆124Updated last year
- FuseAI Project☆87Updated 4 months ago
- My fork os allen AI's OLMo for educational purposes.☆30Updated 6 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆143Updated 8 months ago
- A repository for research on medium sized language models.☆76Updated last year
- ☆47Updated 9 months ago
- ☆34Updated 11 months ago
- An Experiment on Dynamic NTK Scaling RoPE☆64Updated last year
- We aim to provide the best references to search, select, and synthesize high-quality and large-quantity data for post-training your LLMs.☆55Updated 8 months ago
- ☆20Updated last month
- ☆33Updated last month
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆155Updated this week
- ☆79Updated 4 months ago
- GPTQLoRA: Efficient Finetuning of Quantized LLMs with GPTQ☆103Updated 2 years ago
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆37Updated last year
- Fine-tunes a student LLM using teacher feedback for improved reasoning and answer quality. Implements GRPO with teacher-provided evaluati…☆43Updated last month
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆132Updated 11 months ago
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆32Updated 9 months ago
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆121Updated 4 months ago
- 3x Faster Inference; Unofficial implementation of EAGLE Speculative Decoding☆58Updated last week
- ☆17Updated last year
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆32Updated 3 months ago