gabrielolympie / moe-pruner
A repository aimed at pruning DeepSeek V3, R1 and R1-zero to a usable size
☆52Updated last month
Alternatives and similar repositories for moe-pruner
Users that are interested in moe-pruner are comparing it to the libraries listed below
Sorting:
- ☆53Updated 11 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆143Updated 7 months ago
- FuseAI Project☆86Updated 3 months ago
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆77Updated last year
- A repository for research on medium sized language models.☆76Updated 11 months ago
- Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper☆136Updated 9 months ago
- ☆45Updated 2 months ago
- This is the official repository for Inheritune.☆111Updated 3 months ago
- Easy to use, High Performant Knowledge Distillation for LLMs☆81Updated last week
- ☆33Updated 10 months ago
- RWKV-7: Surpassing GPT☆84Updated 6 months ago
- ☆17Updated last year
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆165Updated this week
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆32Updated 9 months ago
- ☆78Updated 4 months ago
- ☆55Updated last month
- My fork os allen AI's OLMo for educational purposes.☆30Updated 5 months ago
- The official code repo and data hub of top_nsigma sampling strategy for LLMs.☆24Updated 3 months ago
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆37Updated last year
- A pipeline for LLM knowledge distillation☆102Updated last month
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆146Updated 3 weeks ago
- Code for preprint "Metadata Conditioning Accelerates Language Model Pre-training (MeCo)"☆38Updated last week
- An Experiment on Dynamic NTK Scaling RoPE☆64Updated last year
- ☆17Updated 4 months ago
- Spherical Merge Pytorch/HF format Language Models with minimal feature loss.☆121Updated last year
- We aim to provide the best references to search, select, and synthesize high-quality and large-quantity data for post-training your LLMs.☆55Updated 7 months ago
- [ICML 2025] | From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation☆90Updated last week
- https://x.com/BlinkDL_AI/status/1884768989743882276☆28Updated last week
- DPO, but faster 🚀☆42Updated 5 months ago
- ☆85Updated 6 months ago