JunHao-Zhu / FusionQueryLinks

[VLDB 2024] Source code for FusionQuery: On-demand Fusion Queries over Multi-source Heterogeneous Data

☆11

Alternatives and similar repositories for FusionQuery

Users that are interested in FusionQuery are comparing it to the libraries listed below

Sorting:

JingyangXiang / SUBP
Pytorch implementation of our paper SUBP: Soft Uniform Block Pruning for 1xN Sparse CNNs Multithreading Acceleration accepted by NeurIPS …
☆22Updated last year
ifromeast / AI_analysis
analyse problems of AI with Math and Code
☆17Updated 2 weeks ago
JingyangXiang / DFRot
DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation; 知乎：https://zhuanlan.zhihu.com/p/1218643…
☆21Updated 3 months ago
FFY0 / AdaKV
The Official Implementation of Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference
☆79Updated 5 months ago
HugoZHL / PQCache
[SIGMOD 2025] PQCache: Product Quantization-based KVCache for Long Context LLM Inference
☆55Updated last week
HarryWu99 / llm_kvcache_sparsity
Implement some method of LLM KV Cache Sparsity
☆32Updated last year
horrible-dong / DNRT
[ICLR 2024] Dynamic Neural Response Tuning
☆16Updated 3 months ago
d-matrix-ai / keyformer-llm
☆54Updated last year
graphcore-research / llm-inference-research
An experimentation platform for LLM inference optimisation
☆31Updated 9 months ago
VITA-Group / Q-Hitter
☆14Updated last year
shishishu / LLM-Inference-Acceleration
LLM Inference with Deep Learning Accelerator.
☆44Updated 5 months ago
DerrickYLJ / TidalDecode
[ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
☆39Updated 2 months ago
pku-liang / ArkVale
ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)
☆40Updated 6 months ago
BaiTheBest / SparseLLM
Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)
☆61Updated 2 months ago
YJHMITWEB / ExFlow
Explore Inter-layer Expert Affinity in MoE Model Inference
☆9Updated last year
amazon-science / piperag
PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design (KDD 2025)
☆21Updated last year
tsinghua-ideal / Twilight
Adaptive Attention Sparsity with Hierarchical Top-p Pruning
☆17Updated 4 months ago
microsoft / RetrievalAttention
Scalable long-context LLM decoding that leverages sparsity—by treating the KV cache as a vector storage system.
☆52Updated last week
NonvolatileMemory / flash_tree_attn
☆19Updated 6 months ago
DD-DuDa / BitDecoding
A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.
☆47Updated 2 weeks ago
wangqinsi1 / Dobi-SVD
Official code implementation for 2025 ICLR accepted paper "Dobi-SVD : Differentiable SVD for LLM Compression and Some New Perspectives"
☆34Updated 3 months ago
A-suozhang / MixDQ
[ECCV24] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization
☆12Updated 7 months ago
Gaffey / ExCP
Official implementation of ICML 2024 paper "ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking".
☆48Updated 11 months ago
xxyux / SpInfer
SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs
☆48Updated 3 months ago
andy-yang-1 / DoubleSparse
16-fold memory access reduction with nearly no loss
☆99Updated 3 months ago
GATECH-EIC / PipeGCN
[ICLR 2022] "PipeGCN: Efficient Full-Graph Training of Graph Convolutional Networks with Pipelined Feature Communication" by Cheng Wan, Y…
☆33Updated 2 years ago
microsoft / Moonlit
This is a collection of our research on efficient AI, covering hardware-aware NAS and model compression.
☆83Updated 8 months ago
hao-ai-lab / MuxServe
☆62Updated last year
cat538 / SKVQ
[COLM 2024] SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
☆20Updated 8 months ago
JingyangXiang / OvSW
Pytorch implementation of our paper OvSW: Overcoming Silent Weights for Accurate Binary Neural Networks accepted by ECCV 2024.
☆18Updated 11 months ago