AIoT-MLSys-Lab / D2OLinks
[ICLR 2025π₯] D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models
β16Updated 3 months ago
Alternatives and similar repositories for D2O
Users that are interested in D2O are comparing it to the libraries listed below
Sorting:
- β42Updated 7 months ago
- official code for GliDe with a CaPEβ19Updated 10 months ago
- Official code for SEAL: Steerable Reasoning Calibration of Large Language Models for Freeβ27Updated 2 months ago
- β33Updated 2 months ago
- [NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejectionβ45Updated 7 months ago
- PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [arXiv '25]β39Updated last month
- β57Updated last year
- [ICML'24 Oral] APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inferenceβ39Updated last year
- β18Updated 7 months ago
- [AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Modelsβ54Updated last year
- [ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Accelerationβ51Updated 4 months ago
- Official implementation for LaCo (EMNLP 2024 Findings)β17Updated 8 months ago
- β104Updated 2 weeks ago
- Official Pytorch Implementation of Our Paper Accepted at ICLR 2024-- Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMβ¦β47Updated last year
- β30Updated last month
- Model merging is a highly efficient approach for long-to-short reasoning.β65Updated 3 weeks ago
- [ICLR 2025] The official pytorch implement of "Dynamic Low-Rank Sparse Adaptation for Large Language Models".β19Updated 3 months ago
- β16Updated 2 months ago
- LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verificationβ54Updated 3 months ago
- β9Updated 9 months ago
- β119Updated last month
- [ICML 2024] SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Modelsβ21Updated last year
- The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":β39Updated last year
- Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inferenceβ39Updated last year
- [ACL 2024] Official PyTorch implementation of "IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact"β44Updated last year
- [ICML'25] Our study systematically investigates massive values in LLMs' attention mechanisms. First, we observe massive values are concenβ¦β73Updated this week
- This repo contains the source code for: Model Tells You What to Discard: Adaptive KV Cache Compression for LLMsβ38Updated 10 months ago
- A block pruning framework for LLMs.β23Updated last month
- β50Updated last year
- [ACL 2025 main] FR-Spec: Frequency-Ranked Speculative Samplingβ34Updated this week