weigao266 / Awesome-Efficient-ArchLinks

Speed Always Wins: A Survey on Efficient Architectures for Large Language Models

☆368

Alternatives and similar repositories for Awesome-Efficient-Arch

Users that are interested in Awesome-Efficient-Arch are comparing it to the libraries listed below

Sorting:

NVlabs / Fast-dLLM
Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"
☆725Updated last week
JT-Ushio / MHA2MLA
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
☆196Updated 2 months ago
MuLabPKU / TransMLA
TransMLA: Multi-Head Latent Attention Is All You Need (NeurIPS 2025 Spotlight)
☆413Updated 2 months ago
QwenLM / ParScale
Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling
☆460Updated 6 months ago
stepfun-ai / Step3
☆439Updated 3 months ago
step-law / steplaw
☆207Updated last month
yaof20 / Flash-RL
Implementation for FP8/INT8 Rollout for RL training without performence drop.
☆279Updated last month
pprp / Awesome-Efficient-MoE
Efficient Mixture of Experts for LLM Paper List
☆145Updated 2 months ago
GAIR-NLP / MAYE
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
☆145Updated 7 months ago
qingkelab / qingketalk
青稞Talk
☆169Updated 2 weeks ago
NVIDIA / Star-Attention
Efficient LLM Inference over Long Sequences
☆392Updated 5 months ago
mit-han-lab / duo-attention
[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
☆510Updated 9 months ago
MiniMax-AI / One-RL-to-See-Them-All
The official repo of One RL to See Them All: Visual Triple Unified Reinforcement Learning
☆328Updated 6 months ago
maomaocun / dLLM-cache
Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache…
☆186Updated 3 weeks ago
RUC-GSAI / YuLan-Mini
A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.
☆221Updated 4 months ago
ZihanWang314 / CoE
Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models
☆223Updated last month
JinjieNi / MegaDLMs
GPU-optimized framework for training diffusion language models at any scale. The backend of Quokka, Super Data Learners, and OpenMoE 2 tr…
☆289Updated 3 weeks ago
radixark / miles
☆344Updated this week
FlagAI-Open / OpenSeek
OpenSeek aims to unite the global open source community to drive collaborative innovation in algorithms, data and systems to develop next…
☆240Updated 3 weeks ago
rlite-project / RLite
A lightweight reinforcement learning framework that integrates seamlessly into your codebase, empowering developers to focus on algorithm…
☆90Updated 3 months ago
zhijie-group / Discrete-Diffusion-Forcing
Discrete Diffusion Forcing (D2F): dLLMs Can Do Faster-Than-AR Inference
☆205Updated 2 months ago
mit-han-lab / x-attention
[ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring
☆256Updated 5 months ago
RyanLiu112 / compute-optimal-tts
Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".
☆276Updated 9 months ago
LCLM-Horizon / A-Comprehensive-Survey-For-Long-Context-Language-Modeling
A Comprehensive Survey on Long Context Language Modeling
☆209Updated 2 weeks ago
openpsi-project / ReaLHF
Super-Efficient RLHF Training of LLMs with Parameter Reallocation
☆326Updated 7 months ago
MiroMindAI / MiroRL
MiroRL is an MCP-first reinforcement learning framework for deep research agent.
☆180Updated 3 months ago
mdy666 / Qwen-Native-Sparse-Attention
qwen-nsa
☆84Updated last month
MiroMindAI / MiroMind-M1
MiroMind-M1 is a fully open-source series of reasoning language models built on Qwen-2.5, focused on advancing mathematical reasoning.
☆245Updated 3 months ago
fscdc / Awesome-Efficient-Reasoning-Models
[TMLR 2025] Efficient Reasoning Models: A Survey
☆282Updated last month
NVIDIA-NeMo / Megatron-Bridge
HuggingFace conversion and training library for Megatron-based models
☆250Updated this week