JunHao-Zhu / FusionQueryLinks
[VLDB 2024] Source code for FusionQuery: On-demand Fusion Queries over Multi-source Heterogeneous Data
☆11Updated 3 months ago
Alternatives and similar repositories for FusionQuery
Users that are interested in FusionQuery are comparing it to the libraries listed below
Sorting:
- Pytorch implementation of our paper SUBP: Soft Uniform Block Pruning for 1xN Sparse CNNs Multithreading Acceleration accepted by NeurIPS …☆22Updated last year
- analyse problems of AI with Math and Code☆17Updated 2 weeks ago
- DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation; 知乎:https://zhuanlan.zhihu.com/p/1218643…☆21Updated 3 months ago
- The Official Implementation of Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference☆79Updated 5 months ago
- [SIGMOD 2025] PQCache: Product Quantization-based KVCache for Long Context LLM Inference☆55Updated last week
- Implement some method of LLM KV Cache Sparsity☆32Updated last year
- [ICLR 2024] Dynamic Neural Response Tuning☆16Updated 3 months ago
- ☆54Updated last year
- An experimentation platform for LLM inference optimisation☆31Updated 9 months ago
- ☆14Updated last year
- LLM Inference with Deep Learning Accelerator.☆44Updated 5 months ago
- [ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention☆39Updated 2 months ago
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)☆40Updated 6 months ago
- Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)☆61Updated 2 months ago
- Explore Inter-layer Expert Affinity in MoE Model Inference☆9Updated last year
- PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design (KDD 2025)☆21Updated last year
- Adaptive Attention Sparsity with Hierarchical Top-p Pruning☆17Updated 4 months ago
- Scalable long-context LLM decoding that leverages sparsity—by treating the KV cache as a vector storage system.☆52Updated last week
- ☆19Updated 6 months ago
- A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.☆47Updated 2 weeks ago
- Official code implementation for 2025 ICLR accepted paper "Dobi-SVD : Differentiable SVD for LLM Compression and Some New Perspectives"☆34Updated 3 months ago
- [ECCV24] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization☆12Updated 7 months ago
- Official implementation of ICML 2024 paper "ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking".☆48Updated 11 months ago
- SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs☆48Updated 3 months ago
- 16-fold memory access reduction with nearly no loss☆99Updated 3 months ago
- [ICLR 2022] "PipeGCN: Efficient Full-Graph Training of Graph Convolutional Networks with Pipelined Feature Communication" by Cheng Wan, Y…☆33Updated 2 years ago
- This is a collection of our research on efficient AI, covering hardware-aware NAS and model compression.☆83Updated 8 months ago
- ☆62Updated last year
- [COLM 2024] SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models☆20Updated 8 months ago
- Pytorch implementation of our paper OvSW: Overcoming Silent Weights for Accurate Binary Neural Networks accepted by ECCV 2024.☆18Updated 11 months ago