gabrielolympie / moe-pruner
A repository aimed at pruning DeepSeek V3, R1 and R1-zero to a usable size
☆44Updated 2 weeks ago
Alternatives and similar repositories for moe-pruner:
Users that are interested in moe-pruner are comparing it to the libraries listed below
- FuseAI Project☆85Updated 3 months ago
- This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…☆56Updated last year
- ☆56Updated last week
- An Experiment on Dynamic NTK Scaling RoPE☆63Updated last year
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)☆102Updated last month
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆121Updated 3 months ago
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆161Updated this week
- [ICLR 2025] MiniPLM: Knowledge Distillation for Pre-Training Language Models☆40Updated 5 months ago
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆139Updated this week
- ☆27Updated 2 months ago
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆36Updated last year
- The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.☆45Updated 6 months ago
- Efficient Mixture of Experts for LLM Paper List☆62Updated 4 months ago
- The official code repo and data hub of top_nsigma sampling strategy for LLMs.☆24Updated 2 months ago
- Code for preprint "Metadata Conditioning Accelerates Language Model Pre-training (MeCo)"☆36Updated last month
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆131Updated 10 months ago
- ZO2 (Zeroth-Order Offloading): Full Parameter Fine-Tuning 175B LLMs with 18GB GPU Memory☆91Updated 3 weeks ago
- LMTuner: Make the LLM Better for Everyone☆35Updated last year
- FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation☆48Updated 9 months ago
- From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation☆88Updated last month
- ☆33Updated 10 months ago
- Code for paper "Patch-Level Training for Large Language Models"☆82Updated 5 months ago
- Linear Attention Sequence Parallelism (LASP)☆82Updated 10 months ago
- ☆20Updated last week
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆55Updated this week
- Official implementation of the ICML 2024 paper RoSA (Robust Adaptation)☆40Updated last year
- Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context☆31Updated 8 months ago
- ☆34Updated 9 months ago
- code for Scaling Laws of RoPE-based Extrapolation☆73Updated last year
- ☆48Updated last year