Relaxed-System-Lab / HKUST-COMP6211J-2025fallLinks
☆19Updated last week
Alternatives and similar repositories for HKUST-COMP6211J-2025fall
Users that are interested in HKUST-COMP6211J-2025fall are comparing it to the libraries listed below
Sorting:
- A sparse attention kernel supporting mix sparse patterns☆315Updated 8 months ago
- [NeurIPS 2024] The official implementation of ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification☆29Updated 6 months ago
- Code for "Accelerating Transformer Pre-training with 2:4 Sparsity"☆24Updated 10 months ago
- A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention☆189Updated last month
- Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inference☆47Updated last year
- SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention☆78Updated this week
- [WSDM'24 Oral] The official implementation of paper <DeSCo: Towards Generalizable and Scalable Deep Subgraph Counting>☆21Updated last year
- [ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity☆58Updated 3 months ago
- Code release for AdapMoE accepted by ICCAD 2024☆34Updated 5 months ago
- [ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring☆236Updated 3 months ago
- [NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.☆171Updated last year
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)☆43Updated 10 months ago
- ☆56Updated last year
- This repository serves as a comprehensive survey of LLM development, featuring numerous research papers along with their corresponding co…☆219Updated 2 months ago
- [NeurIPS'25 Spotlight] Adaptive Attention Sparsity with Hierarchical Top-p Pruning☆20Updated 8 months ago
- [HPCA'24] Smart-Infinity: Fast Large Language Model Training using Near-Storage Processing on a Real System☆49Updated 2 months ago
- SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs☆156Updated 3 weeks ago
- 16-fold memory access reduction with nearly no loss☆105Updated 6 months ago
- ☆49Updated last month
- Artifact for "Marconi: Prefix Caching for the Era of Hybrid LLMs" [MLSys '25 Outstanding Paper Award, Honorable Mention]☆39Updated 7 months ago
- The Official Implementation of Ada-KV [NeurIPS 2025]☆105Updated 3 weeks ago
- Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation☆16Updated 4 months ago
- [COLM 2024] SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models☆24Updated last year
- Learnable Semi-structured Sparsity for Vision Transformers and Diffusion Transformers☆14Updated 8 months ago
- Discrete Diffusion Forcing (D2F): dLLMs Can Do Faster-Than-AR Inference☆163Updated 3 weeks ago
- ☆60Updated 10 months ago
- Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache…☆164Updated last month
- [ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation☆125Updated 6 months ago
- Repo for SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting (ISCA25)☆62Updated 5 months ago
- PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [NeurIPS '25]☆53Updated 2 weeks ago