AIoT-MLSys-Lab / D2O
[ICLR 2025π₯] D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models
β14Updated 2 months ago
Alternatives and similar repositories for D2O
Users that are interested in D2O are comparing it to the libraries listed below
Sorting:
- β51Updated last year
- official code for GliDe with a CaPEβ19Updated 9 months ago
- Codebase for decoding compressed trust.β23Updated last year
- Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inferenceβ37Updated 11 months ago
- Official Pytorch Implementation of Our Paper Accepted at ICLR 2024-- Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMβ¦β47Updated last year
- [AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Modelsβ49Updated last year
- [ICLR 2025] The official pytorch implement of "Dynamic Low-Rank Sparse Adaptation for Large Language Models".β18Updated 2 months ago
- β49Updated 5 months ago
- A curated list of early exiting (LLM, CV, NLP, etc)β48Updated 8 months ago
- Code for "ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models" (ICLR 2024)β20Updated last year
- β15Updated 2 months ago
- Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)β59Updated 7 months ago
- Less is More: Task-aware Layer-wise Distillation for Language Model Compression (ICML2023)β34Updated last year
- Official PyTorch implementation of "IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact"β44Updated 11 months ago
- SQUEEZED ATTENTION: Accelerating Long Prompt LLM Inferenceβ47Updated 5 months ago
- The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":β38Updated last year
- β131Updated 9 months ago
- Official code for SEAL: Steerable Reasoning Calibration of Large Language Models for Freeβ22Updated last month
- This pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).β46Updated 2 years ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswalβ¦β51Updated 2 years ago
- [ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Accelerationβ47Updated 2 months ago
- β28Updated last month
- Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automatonβ26Updated 3 months ago
- β49Updated last year
- β41Updated 11 months ago
- [ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Modelsβ89Updated 11 months ago
- This repo contains the source code for: Model Tells You What to Discard: Adaptive KV Cache Compression for LLMsβ37Updated 9 months ago
- β40Updated 5 months ago
- The Official Implementation of Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inferenceβ74Updated 3 months ago
- β18Updated 5 months ago