☆14Jun 4, 2024Updated last year
Alternatives and similar repositories for Q-Hitter
Users that are interested in Q-Hitter are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)☆184Jul 10, 2024Updated last year
- This repo contains the source code for: Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs☆43Aug 14, 2024Updated last year
- [ISCA'25] LIA: A Single-GPU LLM Inference Acceleration with Cooperative AMX-Enabled CPU-GPU Computation and CXL Offloading☆12Jun 28, 2025Updated 10 months ago
- ☆23Mar 7, 2025Updated last year
- ☆19Mar 11, 2025Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆18Jan 27, 2025Updated last year
- Code for "FactKB: Generalizable Factuality Evaluation using Language Models Enhanced with Factual Knowledge". EMNLP 2023.☆20Dec 25, 2023Updated 2 years ago
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆183Jul 12, 2024Updated last year
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…☆150Aug 9, 2024Updated last year
- Tender: Accelerating Large Language Models via Tensor Decompostion and Runtime Requantization (ISCA'24)☆31Jul 4, 2024Updated last year
- Keyformer proposes KV Cache reduction through key tokens identification and without the need for fine-tuning☆57Mar 26, 2024Updated 2 years ago
- ViTALiTy (HPCA'23) Code Repository☆23Mar 13, 2023Updated 3 years ago
- ☆21Mar 7, 2024Updated 2 years ago
- ☆17Feb 13, 2021Updated 5 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆92Jul 17, 2025Updated 9 months ago
- [TRETS 2025][FPGA 2024] FPGA Accelerator for Imbalanced SpMV using HLS☆21Aug 24, 2025Updated 8 months ago
- ☆355Apr 2, 2024Updated 2 years ago
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆112Oct 15, 2024Updated last year
- ☆14Nov 7, 2025Updated 5 months ago
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference☆382Jul 10, 2025Updated 9 months ago
- MLCommons Science benchmarking working group☆13Apr 17, 2026Updated 2 weeks ago
- ☆13Jan 28, 2026Updated 3 months ago
- Artifacts of VLDB'22 paper "COMET: A Novel Memory-Efficient Deep Learning TrainingFramework by Using Error-Bounded Lossy Compression"☆10Aug 2, 2022Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆13Apr 30, 2024Updated 2 years ago
- Official Repo for "SplitQuant / LLM-PQ: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and …☆39Aug 29, 2025Updated 8 months ago
- open source taxi dispatch software 出行加打车软件UI设计效果图☆14Dec 22, 2020Updated 5 years ago
- ☆26Mar 14, 2024Updated 2 years ago
- RISCV lock-step checker based on Spike☆14Mar 6, 2026Updated 2 months ago
- [ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache