☆14Jun 4, 2024Updated last year
Alternatives and similar repositories for Q-Hitter
Users that are interested in Q-Hitter are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)☆182Jul 10, 2024Updated last year
- This repo contains the source code for: Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs☆43Aug 14, 2024Updated last year
- [ISCA'25] LIA: A Single-GPU LLM Inference Acceleration with Cooperative AMX-Enabled CPU-GPU Computation and CXL Offloading☆13Jun 28, 2025Updated 8 months ago
- ☆12Apr 9, 2025Updated 11 months ago
- ☆33Nov 11, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆23Mar 7, 2025Updated last year
- ☆18Mar 11, 2025Updated last year
- ☆18Jan 27, 2025Updated last year
- Code for "FactKB: Generalizable Factuality Evaluation using Language Models Enhanced with Factual Knowledge". EMNLP 2023.☆20Dec 25, 2023Updated 2 years ago
- Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inference☆48Jun 19, 2024Updated last year
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆180Jul 12, 2024Updated last year
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…☆149Aug 9, 2024Updated last year
- Tender: Accelerating Large Language Models via Tensor Decompostion and Runtime Requantization (ISCA'24)☆31Jul 4, 2024Updated last year
- Low-latency query compiler☆16Jun 3, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Keyformer proposes KV Cache reduction through key tokens identification and without the need for fine-tuning☆57Mar 26, 2024Updated 2 years ago
- ViTALiTy (HPCA'23) Code Repository☆23Mar 13, 2023Updated 3 years ago
- ☆21Mar 7, 2024Updated 2 years ago
- ☆17Feb 13, 2021Updated 5 years ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆91Jul 17, 2025Updated 8 months ago
- ☆354Apr 2, 2024Updated last year
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆112Oct 15, 2024Updated last year
- Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k c…☆27Dec 10, 2022Updated 3 years ago
- ☆14Nov 7, 2025Updated 4 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference☆377Jul 10, 2025Updated 8 months ago
- MLCommons Science benchmarking working group☆13May 19, 2023Updated 2 years ago
- ☆13Jan 28, 2026Updated last month
- Artifacts of VLDB'22 paper "COMET: A Novel Memory-Efficient Deep Learning TrainingFramework by Using Error-Bounded Lossy Compression"☆10Aug 2, 2022Updated 3 years ago
- 🎙️ Retroactively fix your Zoom recordings with a click! Won 1st Place, Best Use of GCP, Best Start-Up, and Best Entrepreneurial Hack at …☆10Feb 10, 2022Updated 4 years ago
- ☆13Apr 30, 2024Updated last year
- Official Repo for "SplitQuant / LLM-PQ: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and …☆37Aug 29, 2025Updated 6 months ago
- RISCV lock-step checker based on Spike☆14Mar 6, 2026Updated 3 weeks ago
- open source taxi dispatch software 出行加打车软件UI设计效果图☆14Dec 22, 2020Updated 5 years ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- ☆26Mar 14, 2024Updated 2 years ago
- [ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache☆363Nov 20, 2025Updated 4 months ago
- A simple cycle-accurate DaDianNao simulator☆13Mar 27, 2019Updated 7 years ago
- ☆36Oct 10, 2024Updated last year
- WIPE implementation☆13Nov 26, 2023Updated 2 years ago
- ☆12Jan 19, 2022Updated 4 years ago
- This is a general-purpose simulator for unary computing based on PyTorch, with the paper accepted to ISCA 2020 and awarded IEEE Micro Top…☆47Jul 31, 2025Updated 7 months ago