VITA-Group / Q-HitterLinks

☆15

Alternatives and similar repositories for Q-Hitter

Users that are interested in Q-Hitter are comparing it to the libraries listed below

Sorting:

pku-liang / ArkVale
ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)
☆46Updated 11 months ago
ranggihwang / Pregated_MoE
☆57Updated last year
cornell-zhang / llm-datatypes
Codebase for ICML'24 paper: Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs
☆27Updated last year
xxyux / SpInfer
SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs
☆59Updated 8 months ago
snu-comparch / Tender
Tender: Accelerating Large Language Models via Tensor Decompostion and Runtime Requantization (ISCA'24)
☆20Updated last year
DerrickYLJ / TidalDecode
[ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
☆49Updated 3 months ago
pku-liang / MAGIS
MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)
☆55Updated last year
abhibambhaniya / GenZ-LLM-Analyzer
LLM Inference analyzer for different hardware platforms
☆96Updated 4 months ago
leesou / PIM-DL-ASPLOS
PIM-DL: Expanding the Applicability of Commodity DRAM-PIMs for Deep Learning via Algorithm-System Co-Optimization
☆33Updated last year
zyqCSL / DiffKV
☆30Updated last month
SNU-ARC / any-precision-llm
[ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
☆121Updated 4 months ago
apuaaChen / EVT_AE
Artifacts of EVT ASPLOS'24
☆28Updated last year
d-matrix-ai / keyformer-llm
☆58Updated last year
PKU-SEC-Lab / AdapMoE
Code release for AdapMoE accepted by ICCAD 2024
☆34Updated 7 months ago
AIS-SNU / GraNNDis_Artifact
[PACT'24] GraNNDis. A fast and unified distributed graph neural network (GNN) training framework for both full-batch (full-graph) and min…
☆10Updated last year
naver-aics / lut-gemm
☆80Updated last year
PingchengDong / GQA-LUT
The official implementation of the DAC 2024 paper GQA-LUT
☆20Updated 11 months ago
clevercool / ANT-Quantization
☆112Updated 2 years ago
aiha-lab / MX-QLLM
LLM Inference with Microscaling Format
☆32Updated last year
hgyhungry / alcop-artifact
☆23Updated 2 years ago
microsoft / SparTA
☆159Updated last year
iankur / vqllm
Residual vector quantization for KV cache compression in large language model
☆10Updated last year
uwsampl / sparsetir-artifact
Repository for artifact evaluation of ASPLOS 2023 paper "SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning"
☆26Updated 2 years ago
AIS-SNU / Smart-Infinity
[HPCA'24] Smart-Infinity: Fast Large Language Model Training using Near-Storage Processing on a Real System
☆49Updated 4 months ago
PrincetonUniversity / LLMCompass
☆207Updated last month
upmem / upmem_llm_framework
UPMEM LLM Framework allows profiling PyTorch layers and functions and simulate those layers/functions with a given hardware profile.
☆36Updated 3 months ago
cat538 / SKVQ
[COLM 2024] SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
☆24Updated last year
ParCIS / Magicube
Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.
☆90Updated 3 years ago
hgyhungry / ShflBW_Sparse_NN
☆16Updated 3 years ago
YJHMITWEB / ExFlow
Explore Inter-layer Expert Affinity in MoE Model Inference
☆15Updated last year