Crys-Chen / DPadLinks
Official implementation of "DPad: Efficient Diffusion Language Models with Suffix Dropout"
☆27Updated last week
Alternatives and similar repositories for DPad
Users that are interested in DPad are comparing it to the libraries listed below
Sorting:
- ☆55Updated last year
- ☆75Updated 4 years ago
- [ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length☆108Updated 4 months ago
- [NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.☆471Updated last year
- Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)☆309Updated 4 months ago
- ☆42Updated 2 years ago
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity☆218Updated last year
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**☆201Updated 6 months ago
- [ACL 2024] Official PyTorch implementation of "IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact"☆47Updated last year
- Code for ACL2022 publication Transkimmer: Transformer Learns to Layer-wise Skim☆21Updated 3 years ago
- [NeurIPS 2024] The official implementation of "Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exitin…☆59Updated last year
- This repository contains integer operators on GPUs for PyTorch.☆213Updated last year
- TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators☆77Updated 2 months ago
- [COLM 2024] SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models☆24Updated 11 months ago
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference☆330Updated last month
- Code Repository of Evaluating Quantized Large Language Models☆130Updated 11 months ago
- QAQ: Quality Adaptive Quantization for LLM KV Cache☆52Updated last year
- ☆85Updated 3 years ago
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)☆43Updated 8 months ago
- official code for GliDe with a CaPE☆18Updated last year
- ☆78Updated 4 months ago
- ☆150Updated last year
- Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.☆76Updated last week
- 16-fold memory access reduction with nearly no loss☆104Updated 5 months ago
- ☆56Updated 9 months ago
- The Official Implementation of Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference☆92Updated 2 months ago
- Implement some method of LLM KV Cache Sparsity☆35Updated last year
- ☆277Updated last month
- ☆69Updated last year
- Bridge Megatron-Core to Hugging Face/Reinforcement Learning☆103Updated last week