[ICLR 2025๐ฅ] D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models
โ27Jul 7, 2025Updated 11 months ago
Alternatives and similar repositories for D2O
Users that are interested in D2O are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inferenceโ50Jun 19, 2024Updated 2 years ago
- โ47Nov 25, 2024Updated last year
- โ47Oct 16, 2025Updated 8 months ago
- โ39Mar 17, 2025Updated last year
- Codebase for the ACL 2023 paper: White-Box Multi-Objective Adversarial Attack on Dialogue Generation.โ16Dec 8, 2023Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits โข AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- The Official Implementation of Ada-KV [NeurIPS 2025]โ136Nov 26, 2025Updated 7 months ago
- The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.โ51Oct 18, 2024Updated last year
- โ321Jul 10, 2025Updated 11 months ago
- โ48Mar 15, 2025Updated last year
- Code for the EMNLP24 paper "A simple and effective L2 norm based method for KV Cache compression."โ18Dec 13, 2024Updated last year
- ๅไบซไธไบS2Sๅจๅฎ้ ๅบ็จไธญ้ๅฐ็้ฎ้ขๅ่งฃๅณๆนๆณใโ28Aug 3, 2020Updated 5 years ago
- [ACL 2026] Repository of IPBenchโ23Apr 6, 2026Updated 2 months ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)โ34Mar 7, 2025Updated last year
- Code and data for "Impact of Evaluation Methodologies on Code Summarization" in ACL 2022.โ10Sep 6, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient โข AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Uncertainty-Aware Curriculum Learning for Neural Machine Translation (ACL 2020)โ11Jun 12, 2020Updated 6 years ago
- Marathon: A Multiple-choice Long Context Evaluation Benchmark for Large Language Models.โ10May 16, 2024Updated 2 years ago
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)โ53Dec 17, 2024Updated last year
- ๐ฐ Must-read papers on KV Cache Compression (constantly updating ๐ค).โ720Apr 15, 2026Updated 2 months ago
- โ10Apr 29, 2023Updated 3 years ago
- InternLM-7Bๅพฎ่ฐ, SFT/LoRA, instruction finetuneโ13May 17, 2024Updated 2 years ago
- The open source implementation of the multi grouped query attention by the paper "GQA: Training Generalized Multi-Query Transformer Modelโฆโ16Dec 11, 2023Updated 2 years ago
- AloePlayer: a cross-platform local media player.โ17Jan 24, 2026Updated 5 months ago
- An opinionated NLP research templateโ10Aug 29, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean โข AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- โ24Jun 7, 2021Updated 5 years ago
- โ34Sep 19, 2025Updated 9 months ago
- [CVPR 2026] Variation-aware Vision Token Dropping for Faster Large Vision-Language Modelsโ30May 27, 2026Updated last month
- โ10Dec 3, 2024Updated last year
- hints for xv6lab in installing and doingโ11Jan 28, 2021Updated 5 years ago
- โ17Sep 11, 2025Updated 9 months ago
- LLM KV cache compression made easyโ1,120Jun 22, 2026Updated last week
- [ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generationโ254Dec 16, 2024Updated last year
- Fast and memory-efficient exact attentionโ22Updated this week
- Serverless GPU API endpoints on Runpod - Get Bonus Credits โข AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Compiler development environment.โ21Apr 9, 2026Updated 2 months ago
- Benchmarking Social Intelligence of Language Agents through Interactive Scenariosโ13Jan 4, 2025Updated last year
- Adversarial Robustness for Codeโ16Mar 30, 2021Updated 5 years ago
- Source Code for Online Collective Matrix Factorization Hashing. Reference: Di Wang, Quan Wang, Yaqiang An, Xinbo Gao, and Yumin Tian. 202โฆโ11Oct 20, 2020Updated 5 years ago
- Code base for the EMNLP 2021 paper, "Multi-granularity Textual Adversarial Attack with Behavior Cloning".โ13Apr 18, 2022Updated 4 years ago
- Approximate convex decomposition(ACD)โ10Sep 9, 2023Updated 2 years ago
- โ16Jun 14, 2024Updated 2 years ago