ifromeast / AI_analysisLinks

analyse problems of AI with Math and Code

☆26

Alternatives and similar repositories for AI_analysis

Users that are interested in AI_analysis are comparing it to the libraries listed below

Sorting:

smart-lty / ParallelSpeculativeDecoding
[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length
☆118Updated 6 months ago
cat538 / SKVQ
[COLM 2024] SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
☆24Updated last year
microsoft / SeerAttention
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
☆156Updated 3 weeks ago
ByteDance-Seed / FlexPrefill
Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
☆143Updated this week
LiuXiaoxuanPKU / OSD
☆60Updated 10 months ago
d-matrix-ai / keyformer-llm
☆56Updated last year
HarryWu99 / llm_kvcache_sparsity
Implement some method of LLM KV Cache Sparsity
☆39Updated last year
InternLM / Awesome-LLM-Training-System
☆43Updated last year
mit-han-lab / Quest
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
☆338Updated 3 months ago
madsys-dev / deepseekv2-profile
☆148Updated 7 months ago
yaof20 / Flash-RL
Implementation for FP8/INT8 Rollout for RL training without performence drop.
☆260Updated 2 weeks ago
andy-yang-1 / DoubleSparse
16-fold memory access reduction with nearly no loss
☆105Updated 6 months ago
mdy666 / mdy_triton
☆147Updated 3 months ago
hao-ai-lab / vllm-ltr
[NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank
☆60Updated 11 months ago
opengear-project / GEAR
GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
☆169Updated last year
FFY0 / AdaKV
The Official Implementation of Ada-KV [NeurIPS 2025]
☆105Updated 3 weeks ago
Gaffey / ExCP
Official implementation of ICML 2024 paper "ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking".
☆48Updated last year
TreeAI-Lab / Awesome-KV-Cache-Management
This repository serves as a comprehensive survey of LLM development, featuring numerous research papers along with their corresponding co…
☆219Updated 2 months ago
pprp / Awesome-Efficient-MoE
Efficient Mixture of Experts for LLM Paper List
☆136Updated 3 weeks ago
NVlabs / COAT
[ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training
☆241Updated 2 months ago
PKU-SEC-Lab / HybriMoE
[DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"
☆72Updated 4 months ago
dhcode-cpp / NSA-pytorch
DeepSeek Native Sparse Attention pytorch implementation
☆103Updated last week
NJUNLP / MCSD
Multi-Candidate Speculative Decoding
☆36Updated last year
thu-pacman / SmartMoE-AE
ATC23 AE
☆47Updated 2 years ago
DerrickYLJ / TidalDecode
[ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
☆48Updated 2 months ago
OpenSparseLLMs / Linear-MoE
☆118Updated 4 months ago
luliyucoordinate / cute-flash-attention
Implement Flash Attention using Cute.
☆96Updated 10 months ago
dilab-zju / self-speculative-decoding
Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**
☆203Updated 8 months ago
mdy666 / Qwen-Native-Sparse-Attention
qwen-nsa
☆78Updated 6 months ago
ISEEKYAN / mbridge
Bridge Megatron-Core to Hugging Face/Reinforcement Learning
☆132Updated this week