Relaxed-System-Lab / HKUST-COMP6211J-2025fallLinks
☆20Updated last month
Alternatives and similar repositories for HKUST-COMP6211J-2025fall
Users that are interested in HKUST-COMP6211J-2025fall are comparing it to the libraries listed below
Sorting:
- [NeurIPS'25 Spotlight] Adaptive Attention Sparsity with Hierarchical Top-p Pruning☆69Updated 2 weeks ago
- Github repo for ICLR-2025 paper, Fine-tuning Large Language Models with Sparse Matrices☆21Updated 6 months ago
- Code release for AdapMoE accepted by ICCAD 2024☆34Updated 7 months ago
- SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs☆59Updated 8 months ago
- ☆39Updated 3 months ago
- A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention☆233Updated 3 months ago
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)☆46Updated 11 months ago
- gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling☆51Updated last week
- ☆52Updated 2 months ago
- [HPCA 2026] A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.☆63Updated last week
- [COLM 2024] SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models☆24Updated last year
- ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression (DAC'25)☆16Updated 2 months ago
- InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)☆161Updated last year
- ☆58Updated last year
- Summary of some awesome work for optimizing LLM inference☆138Updated 3 weeks ago
- The Official Implementation of Ada-KV [NeurIPS 2025]☆114Updated 2 months ago
- This repository serves as a comprehensive survey of LLM development, featuring numerous research papers along with their corresponding co…☆248Updated 4 months ago
- ☆57Updated last year
- Repo for SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting (ISCA25)☆68Updated 7 months ago
- Artifact for "Marconi: Prefix Caching for the Era of Hybrid LLMs" [MLSys '25 Outstanding Paper Award, Honorable Mention]☆46Updated 8 months ago
- [HPCA'24] Smart-Infinity: Fast Large Language Model Training using Near-Storage Processing on a Real System☆49Updated 4 months ago
- [NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive☆49Updated 2 months ago
- Tile-based language built for AI computation across all scales☆82Updated this week
- Building the Virtuous Cycle for AI-driven LLM Systems☆92Updated last week
- DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit☆82Updated this week
- Efficient Reinforcement Learning for Language Models☆43Updated last week
- Implement some method of LLM KV Cache Sparsity☆42Updated last year
- A record of reading list on some MLsys popular topic☆16Updated 8 months ago
- WaferLLM: Large Language Model Inference at Wafer Scale☆75Updated 3 weeks ago
- ☆137Updated last week