gabrielolympie / moe-prunerLinks
A repository aimed at pruning DeepSeek V3, R1 and R1-zero to a usable size
☆58Updated 2 months ago
Alternatives and similar repositories for moe-pruner
Users that are interested in moe-pruner are comparing it to the libraries listed below
Sorting:
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆80Updated last month
- ☆53Updated last year
- ☆35Updated last year
- Evaluating LLMs with Dynamic Data☆93Updated last month
- FuseAI Project☆87Updated 5 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆144Updated 9 months ago
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆78Updated last year
- QuIP quantization☆54Updated last year
- Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper☆137Updated 11 months ago
- The official code repo and data hub of top_nsigma sampling strategy for LLMs.☆26Updated 4 months ago
- Code for preprint "Metadata Conditioning Accelerates Language Model Pre-training (MeCo)"☆39Updated last month
- ☆47Updated 9 months ago
- Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)☆86Updated 2 weeks ago
- My fork os allen AI's OLMo for educational purposes.☆30Updated 6 months ago
- RWKV-7: Surpassing GPT☆92Updated 7 months ago
- A pipeline for LLM knowledge distillation☆104Updated 2 months ago
- Longitudinal Evaluation of LLMs via Data Compression☆32Updated last year
- ☆17Updated last year
- This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…☆57Updated last year
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆39Updated last year
- ☆56Updated 3 months ago
- ☆80Updated 5 months ago
- Spherical Merge Pytorch/HF format Language Models with minimal feature loss.☆129Updated last year
- A repository for research on medium sized language models.☆76Updated last year
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆42Updated last year
- 3x Faster Inference; Unofficial implementation of EAGLE Speculative Decoding☆66Updated last week
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆133Updated last year
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…☆47Updated 2 months ago
- Easy to use, High Performant Knowledge Distillation for LLMs☆86Updated last month
- An Experiment on Dynamic NTK Scaling RoPE☆64Updated last year