☆14Jun 4, 2024Updated last year
Alternatives and similar repositories for Q-Hitter
Users that are interested in Q-Hitter are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)☆182Jul 10, 2024Updated last year
- This repo contains the source code for: Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs☆43Aug 14, 2024Updated last year
- [ISCA'25] LIA: A Single-GPU LLM Inference Acceleration with Cooperative AMX-Enabled CPU-GPU Computation and CXL Offloading☆12Jun 28, 2025Updated 9 months ago
- ☆33Nov 11, 2024Updated last year
- ☆23Mar 7, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆18Mar 11, 2025Updated last year
- ☆18Jan 27, 2025Updated last year
- Code for "FactKB: Generalizable Factuality Evaluation using Language Models Enhanced with Factual Knowledge". EMNLP 2023.☆20Dec 25, 2023Updated 2 years ago
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆181Jul 12, 2024Updated last year
- Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inference☆48Jun 19, 2024Updated last year
- Tender: Accelerating Large Language Models via Tensor Decompostion and Runtime Requantization (ISCA'24)☆31Jul 4, 2024Updated last year
- Keyformer proposes KV Cache reduction through key tokens identification and without the need for fine-tuning☆57Mar 26, 2024Updated 2 years ago
- ViTALiTy (HPCA'23) Code Repository☆23Mar 13, 2023Updated 3 years ago
- ☆21Mar 7, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆17Feb 13, 2021Updated 5 years ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆92Jul 17, 2025Updated 8 months ago
- [TRETS 2025][FPGA 2024] FPGA Accelerator for Imbalanced SpMV using HLS☆20Aug 24, 2025Updated 7 months ago
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆112Oct 15, 2024Updated last year
- Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k c…☆27Dec 10, 2022Updated 3 years ago
- ☆14Nov 7, 2025Updated 5 months ago
- MLCommons Science benchmarking working group☆13May 19, 2023Updated 2 years ago
- ☆13Jan 28, 2026Updated 2 months ago
- Artifacts of VLDB'22 paper "COMET: A Novel Memory-Efficient Deep Learning TrainingFramework by Using Error-Bounded Lossy Compression"☆10Aug 2, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- 🎙️ Retroactively fix your Zoom recordings with a click! Won 1st Place, Best Use of GCP, Best Start-Up, and Best Entrepreneurial Hack at …☆10Feb 10, 2022Updated 4 years ago
- ☆13Apr 30, 2024Updated last year
- Official Repo for "SplitQuant / LLM-PQ: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and …☆37Aug 29, 2025Updated 7 months ago
- RISCV lock-step checker based on Spike☆14Mar 6, 2026Updated last month
- open source taxi dispatch software 出行加打车软件UI设计效果图☆14Dec 22, 2020Updated 5 years ago
- ☆26Mar 14, 2024Updated 2 years ago
- [ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache☆384Nov 20, 2025Updated 4 months ago
- A simple cycle-accurate DaDianNao simulator☆13Mar 27, 2019Updated 7 years ago
- a TensorFlow implementation of the paper "Feature Super-Resolution Based Facial Expression Recognition for Multi-scale Low-Resolution Ima…☆13Nov 30, 2021Updated 4 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆37Oct 10, 2024Updated last year
- Benchmarking various sparse convolution libraries: MinkowskiEngine, SpConv, TorchSparse, and Open3D.☆13Apr 10, 2023Updated 3 years ago
- WIPE implementation☆13Nov 26, 2023Updated 2 years ago
- A portable implementation of SZ lossy compression for AMD GPUs and Hygon DCUs.☆10Feb 26, 2025Updated last year
- This is a general-purpose simulator for unary computing based on PyTorch, with the paper accepted to ISCA 2020 and awarded IEEE Micro Top…☆47Jul 31, 2025Updated 8 months ago
- An experimentation platform for LLM inference optimisation☆36Sep 19, 2024Updated last year
- An out-of-order processor that supports multiple instruction sets.☆22Aug 23, 2022Updated 3 years ago