qingkelab / qingkelab.github.ioLinks

青稞社区

☆26

Alternatives and similar repositories for qingkelab.github.io

Users that are interested in qingkelab.github.io are comparing it to the libraries listed below

Sorting:

NVlabs / COAT
[ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training
☆221Updated last month
TencentARC / mllm-npu
mllm-npu: training multimodal large language models on Ascend NPUs
☆91Updated 11 months ago
stepfun-ai / Step3
☆326Updated last week
pprp / Awesome-Efficient-MoE
Efficient Mixture of Experts for LLM Paper List
☆87Updated 7 months ago
mdy666 / Qwen-Native-Sparse-Attention
qwen-nsa
☆70Updated 3 months ago
NVlabs / Fast-dLLM
Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"
☆320Updated this week
OpenBMB / infllmv2_cuda_impl
☆39Updated 2 months ago
maomaocun / dLLM-cache
Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache…
☆132Updated this week
ThisisBillhe / ZipCache
[NeurIPS 2024] The official implementation of ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification
☆23Updated 4 months ago
mit-han-lab / x-attention
[ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring
☆213Updated last month
sii-research / siiRL
siiRL: Shanghai Innovation Institute RL Framework for Advanced LLMs and Multi-Agent Systems
☆152Updated this week
mit-han-lab / Block-Sparse-Attention
A sparse attention kernel supporting mix sparse patterns
☆262Updated 5 months ago
ByteDance-Seed / FlexPrefill
Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
☆124Updated 2 months ago
Gaffey / ExCP
Official implementation of ICML 2024 paper "ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking".
☆48Updated last year
dhcode-cpp / NSA-pytorch
DeepSeek Native Sparse Attention pytorch implementation
☆83Updated 5 months ago
andy-yang-1 / DoubleSparse
16-fold memory access reduction with nearly no loss
☆103Updated 4 months ago
XunhaoLai / native-sparse-attention-triton
Efficient triton implementation of Native Sparse Attention.
☆186Updated 2 months ago
hao-ai-lab / Awesome-Video-Attention
A curated list of recent papers on efficient video attention for video diffusion models, including sparsification, quantization, and cach…
☆32Updated 2 weeks ago
InternLM / turbomind
☆92Updated 4 months ago
RiseAI-Sys / DAX
High performance inference engine for diffusion models
☆26Updated last week
microsoft / chunk-attention
☆78Updated 3 months ago
smart-lty / ParallelSpeculativeDecoding
[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length
☆102Updated 3 months ago
GAIR-NLP / MAYE
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
☆138Updated 3 months ago
thu-nics / MoA
[CoLM'25] The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>
☆145Updated 3 weeks ago
Dao-AILab / grouped-latent-attention
☆123Updated 2 months ago
microsoft / AttentionEngine
☆75Updated 2 months ago
SkyworkAI / Skywork-MoE
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models
☆136Updated last year
microsoft / SeerAttention
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
☆141Updated last week
li-plus / flash-preference
Accelerate LLM preference tuning via prefix sharing with a single line of code
☆42Updated last month
madsys-dev / deepseekv2-profile
☆145Updated 5 months ago