SusCom-Lab / ZSMergeLinks

☆18

Alternatives and similar repositories for ZSMerge

Users that are interested in ZSMerge are comparing it to the libraries listed below

Sorting:

Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated 10 months ago
OswaldHe / HMT-pytorch
[NAACL 2025] Official Implementation of "HMT: Hierarchical Memory Transformer for Long Context Language Processing"
☆75Updated 4 months ago
NolanoOrg / SpectraSuite
☆51Updated last year
OpenNLPLab / LASP
Linear Attention Sequence Parallelism (LASP)
☆87Updated last year
lfsszd / CS-Drafting
Cascade Speculative Drafting
☆31Updated last year
wdlctc / headinfer
☆58Updated 5 months ago
UmerHA / triton_util
Make triton easier
☆48Updated last year
whyNLP / LCKV
Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…
☆155Updated 6 months ago
66RING / CritiPrefill
Code repo for "CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs".
☆16Updated last year
royeisen / reasoning_loading_bar
☆51Updated 3 months ago
IST-DASLab / SparseFinetuning
Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry
☆42Updated last year
mistralai / mistral-evals
☆77Updated 2 months ago
TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆78Updated last year
nanowell / Q-Sparse-LLM
My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
☆33Updated last year
IST-DASLab / QIGen
Repository for CPU Kernel Generation for LLM Inference
☆26Updated 2 years ago
hetailang / SqueezeAttention
☆38Updated last year
foundation-model-stack / bamba
Train, tune, and infer Bamba model
☆135Updated 4 months ago
scitix / MEAP
Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More
☆33Updated 5 months ago
siyan-zhao / prepacking
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …
☆60Updated last year
Infini-AI-Lab / gsm_infinite
☆55Updated 4 months ago
shuzhangzhong / HybriMoE-Preview
☆17Updated 6 months ago
PiotrNawrot / sparse-frontier
The evaluation framework for training-free sparse attention in LLMs
☆102Updated 2 weeks ago
wdlctc / mini-s
☆52Updated last year
JarvisPei / CMoE
Implementation for the paper: CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference
☆25Updated 7 months ago
ScalingIntelligence / good-kernels
Samples of good AI generated CUDA kernels
☆91Updated 4 months ago
Scientific-Computing-Lab / MPI-rigen
MPI Code Generation through Domain-Specific Language Models
☆14Updated 11 months ago
zenrran4nlp / Awesome-LLM-Inference-Serving
☆43Updated 6 months ago
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆98Updated 11 months ago
uclaml / COPS
The official implementation of Cross-Task Experience Sharing (COPS)
☆29Updated last year
rayleizhu / vllm-ra
[ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts
☆40Updated last year