SusCom-Lab / ZSMergeLinks
☆19Updated 2 months ago
Alternatives and similar repositories for ZSMerge
Users that are interested in ZSMerge are comparing it to the libraries listed below
Sorting:
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆130Updated last year
- Linear Attention Sequence Parallelism (LASP)☆88Updated last year
- DPO, but faster 🚀☆46Updated last year
- A repository for research on medium sized language models.☆77Updated last year
- [ICML 2025] LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models☆16Updated last month
- ☆66Updated 8 months ago
- ☆63Updated 7 months ago
- RWKV-7: Surpassing GPT☆101Updated last year
- GoldFinch and other hybrid transformer components☆45Updated last year
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆35Updated 9 months ago
- FlexAttention w/ FlashAttention3 Support☆27Updated last year
- Simple high-throughput inference library☆152Updated 7 months ago
- Make triton easier☆49Updated last year
- Data preparation code for CrystalCoder 7B LLM☆45Updated last year
- Train, tune, and infer Bamba model☆137Updated 6 months ago
- Repository for CPU Kernel Generation for LLM Inference☆27Updated 2 years ago
- Official repository for the paper "NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks". This rep…☆60Updated last year
- ☆52Updated last year
- Official Implementation of APB (ACL 2025 main Oral)☆32Updated 9 months ago
- A library for simplifying training with multi gpu setups in the HuggingFace / PyTorch ecosystem.☆16Updated last week
- ☆80Updated 3 weeks ago
- [WIP] Better (FP8) attention for Hopper☆32Updated 9 months ago
- ☆52Updated 5 months ago
- The code repository for the CURLoRA research paper. Stable LLM continual fine-tuning and catastrophic forgetting mitigation.☆53Updated last year
- ☆60Updated 6 months ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆37Updated 2 months ago
- Defeating the Training-Inference Mismatch via FP16☆165Updated last month
- ☆39Updated last year
- [NAACL 2025] Official Implementation of "HMT: Hierarchical Memory Transformer for Long Context Language Processing"☆77Updated 5 months ago
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆157Updated 8 months ago