fvliang / DARTLinks
Official Implementation of DART (DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference).
☆40Updated this week
Alternatives and similar repositories for DART
Users that are interested in DART are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.☆180Updated last year
- 📰 Must-read papers on KV Cache Compression (constantly updating 🤗).☆658Updated 4 months ago
- DiTAS: Quantizing Diffusion Transformers via Enhanced Activation Smoothing (WACV 2025)☆12Updated this week
- PyTorch implementation of PTQ4DiT https://arxiv.org/abs/2405.16005☆45Updated last year
- Curated collection of papers in MoE model inference☆341Updated 3 months ago
- Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.☆411Updated 11 months ago
- A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention☆278Updated 2 months ago
- This repository serves as a comprehensive survey of LLM development, featuring numerous research papers along with their corresponding co…☆281Updated 2 months ago
- [ICLR2025]: OSTQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitt…☆88Updated 10 months ago
- ☆34Updated 10 months ago
- InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)☆174Updated last year
- [ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation☆149Updated 10 months ago
- 🎓Automatically Update LLM inference systems Papers Daily using Github Actions (Update Every 12th hours)☆12Updated this week
- Summary of some awesome work for optimizing LLM inference☆173Updated 2 months ago
- This repository contains low-bit quantization papers from 2020 to 2025 on top conference.☆95Updated 4 months ago
- 📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉☆518Updated 3 weeks ago
- Paper reading and discussion notes, covering AI frameworks, distributed systems, cluster management, etc.☆54Updated 3 months ago
- Survey Paper List - Efficient LLM and Foundation Models☆260Updated last year
- [ACM MM2025]: MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Full Static Quantization☆35Updated 5 months ago
- 📚 Collection of awesome generation acceleration resources.☆387Updated 7 months ago
- [NeurIPS 2024] The official implementation of ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification☆32Updated 10 months ago
- [NeurIPS'24]Efficient and accurate memory saving method towards W4A4 large multi-modal models.☆95Updated last year
- [ICCV'25] The official code of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models"☆69Updated 3 weeks ago
- ☆26Updated last year
- ☆32Updated 6 months ago
- [ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"☆211Updated 2 months ago
- [ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring☆269Updated 7 months ago
- Fast, memory-efficient attention column reduction (e.g., sum, mean, max)☆34Updated this week
- The code repository of "MBQ: Modality-Balanced Quantization for Large Vision-Language Models"☆75Updated 10 months ago
- A list of papers, docs, codes about efficient AIGC. This repo is aimed to provide the info for efficient AIGC research, including languag…☆205Updated last year