weigao266 / Awesome-Efficient-ArchLinks
Speed Always Wins: A Survey on Efficient Architectures for Large Language Models
☆357Updated this week
Alternatives and similar repositories for Awesome-Efficient-Arch
Users that are interested in Awesome-Efficient-Arch are comparing it to the libraries listed below
Sorting:
- Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"☆639Updated 3 weeks ago
- TransMLA: Multi-Head Latent Attention Is All You Need (NeurIPS 2025 Spotlight)☆407Updated last month
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆193Updated last month
- Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling☆450Updated 5 months ago
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads☆499Updated 9 months ago
- ☆431Updated 3 months ago
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆144Updated 7 months ago
- Efficient LLM Inference over Long Sequences☆390Updated 4 months ago
- ☆205Updated 2 weeks ago
- Efficient Mixture of Experts for LLM Paper List☆143Updated last month
- Implementation for FP8/INT8 Rollout for RL training without performence drop.☆268Updated last week
- The official repo of One RL to See Them All: Visual Triple Unified Reinforcement Learning☆328Updated 5 months ago
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆223Updated last week
- DeepSeek Native Sparse Attention pytorch implementation☆107Updated last week
- 青稞Talk☆160Updated last week
- siiRL: Shanghai Innovation Institute RL Framework for Advanced LLMs and Multi-Agent Systems☆224Updated this week
- Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".☆275Updated 8 months ago
- [Preprint] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification.☆495Updated last week
- VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo☆1,283Updated this week
- Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache…☆176Updated 2 months ago
- [TMLR 2025] Efficient Reasoning Models: A Survey☆276Updated 2 weeks ago
- ☆817Updated 5 months ago
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆222Updated 3 months ago
- [NeurIPS 24 Spotlight] MaskLLM: Learnable Semi-structured Sparsity for Large Language Models☆179Updated 10 months ago
- [NeurIPS 2025 Spotlight] TPA: Tensor ProducT ATTenTion Transformer (T6) (https://arxiv.org/abs/2501.06425)☆422Updated 3 weeks ago
- Checkpoint-engine is a simple middleware to update model weights in LLM inference engines☆820Updated this week
- ☆973Updated last month
- GPU-optimized framework for training diffusion language models at any scale. The backend of Quokka, Super Data Learners, and OpenMoE 2 tr…☆259Updated this week
- Super-Efficient RLHF Training of LLMs with Parameter Reallocation☆322Updated 6 months ago
- [ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring☆245Updated 4 months ago