SqueezeAILab/MultipoleAttention

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/SqueezeAILab/MultipoleAttention)

SqueezeAILab / MultipoleAttention

[NeurIPS 2025] Multipole Attention for Efficient Long Context Reasoning

☆24

Alternatives and similar repositories for MultipoleAttention

Users that are interested in MultipoleAttention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

SqueezeAILab / SqueezedAttention
View on GitHub
[ACL 2025] Squeezed Attention: Accelerating Long Prompt LLM Inference
☆58Nov 20, 2024Updated last year
thunlp / NOSA
View on GitHub
The official implementation of NOSA
☆19Jun 11, 2026Updated last month
kssteven418 / SqueezeLLM-gradients
View on GitHub
☆21Feb 5, 2024Updated 2 years ago
huangyuxiang03 / Locret
View on GitHub
☆14Oct 3, 2024Updated last year
furiosa-ai / ParallelBench
View on GitHub
[ICLR 2026] ParallelBench: Understanding the Tradeoffs of Parallel Decoding in Diffusion LLMs
☆47Mar 27, 2026Updated 3 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
SqueezeAILab / Tool2Vec
View on GitHub
Efficient and Scalable Estimation of Tool Representations in Vector Space
☆31Sep 5, 2024Updated last year
hjeon2k / LRAgent
View on GitHub
Official implementation of LRAgent: Efficient KV Cache Sharing for Multi-LoRA LLM Agents
☆26Feb 1, 2026Updated 5 months ago
dongwonjo / FastKV
View on GitHub
[ACL Findings 2026] Official Implementation of "FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acc…
☆32Apr 14, 2026Updated 3 months ago
GBATZOLIS / BitstreamDiffusion
View on GitHub
☆15Updated this week
YujieLu10 / Seeker
View on GitHub
☆11May 24, 2024Updated 2 years ago
UCSB-NLP-Chang / KVLink
View on GitHub
☆48Oct 16, 2025Updated 9 months ago
furiosa-ai / draft-based-approx-llm
View on GitHub
[ICLR 2026] Draft-based Approximate Inference for LLMs
☆21Mar 10, 2026Updated 4 months ago
georgia-tech-synergy-lab / MicroScopiQ-LLM-Quantization
View on GitHub
[ISCA 2025] Official Implementation of "MicroScopiQ: Accelerating Foundational Models through Outlier-Aware Microscaling Quantization"
☆24Oct 30, 2025Updated 8 months ago
uservan / speculative_thinking
View on GitHub
☆34Oct 13, 2025Updated 9 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
zhichao-lu / llm-eps
View on GitHub
☆13Jul 15, 2024Updated 2 years ago
FYYFU / HeadKV
View on GitHub
[ICLR2025] Code and data for paper: Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasonin…
☆45Mar 10, 2025Updated last year
snu-mllab / KVzip
View on GitHub
[NeurIPS'25 Oral] Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)
☆225Feb 11, 2026Updated 5 months ago
JosephJeesungSuh / subpop
View on GitHub
[ACL 2025 Long Main] Language Model Fine-Tuning on Scaled Survey Data for Predicting Distributions of Public Opinions
☆44Apr 21, 2025Updated last year
CannyLab / causal_overhypotheses
View on GitHub
Code for Dataset and Benchmarks Submission, Neurips 2022
☆13Aug 16, 2022Updated 3 years ago
tingofurro / headline_grouping
View on GitHub
Codebase, data and models for the Headline Grouping paper at NAACL2021
☆12Oct 2, 2022Updated 3 years ago
PiotrNawrot / sparse-frontier
View on GitHub
The evaluation framework for training-free sparse attention in LLMs
☆127Jan 27, 2026Updated 5 months ago
SakanaAI / fast-weight-product-key-memory
View on GitHub
Code for Fast-weight Product Key Memory (FwPKM)
☆19Mar 18, 2026Updated 4 months ago
xAlg-ai / HashAttention-1.0
View on GitHub
☆18Sep 23, 2025Updated 10 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Infini-AI-Lab / Sirius
View on GitHub
Sirius, an efficient correction mechanism, which significantly boosts Contextual Sparsity models on reasoning tasks while maintaining its…
☆21Sep 10, 2024Updated last year
SqueezeAILab / open_source_projects
View on GitHub
Open Source Projects from Pallas Lab
☆21Oct 10, 2021Updated 4 years ago
ignoww / ZOODiP
View on GitHub
[CVPR 2025] Efficient Personalization of Quantized Diffusion Model without Backpropagation
☆17Mar 31, 2025Updated last year
furiosa-ai / EfficientRollout
View on GitHub
EfficientRollout: System-Aware Self-Speculative Decoding for RL Rollouts
☆16Jun 24, 2026Updated last month
hao-ai-lab / LookaheadReasoning
View on GitHub
[NeurIPS 2025] Scaling Speculative Decoding with Lookahead Reasoning
☆69Oct 31, 2025Updated 8 months ago
Janghyun1230 / FastKVzip
View on GitHub
Accurate and fast KV cache compression with a gating mechanism
☆27Apr 5, 2026Updated 3 months ago
JackChuengQAQ / CaLiG
View on GitHub
Source codes of "Fast Continuous Subgraph Matching over Streaming Graphs via Backtracking Reduction", SIGMOD 2023
☆14Sep 7, 2023Updated 2 years ago
ngocbh / trimkv
View on GitHub
[TrimKV] Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs - [DBTrimKV] Make Each Token Count: Towards Improving Lo…
☆15May 13, 2026Updated 2 months ago
FibonaccciYan / Adamas
View on GitHub
Adamas: Hadamard Sparse Attention for Efficient Long-context Inference
☆15May 19, 2026Updated 2 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
tsinghua-ideal / Twilight
View on GitHub
[NeurIPS'25 Spotlight] Adaptive Attention Sparsity with Hierarchical Top-p Pruning
☆105Jul 8, 2026Updated 2 weeks ago
scos-lab / turboquant
View on GitHub
TurboQuant reference implementation — KV cache compression with engineering insights (ICLR 2026 paper reproduction)
☆18Mar 28, 2026Updated 3 months ago
soyoung97 / AcuRank
View on GitHub
☆15Jul 30, 2025Updated 11 months ago
Infini-AI-Lab / gsm_infinite
View on GitHub
☆65Jun 12, 2025Updated last year
duchesneaumathieu / pyperlin
View on GitHub
GPU accelerated Perlin Noise in python
☆11Oct 23, 2020Updated 5 years ago
songmzhang / DSKDv2
View on GitHub
The official implementation of the paper "A Dual-Space Framework for General Knowledge Distillation of Large Language Models".
☆18Jan 4, 2026Updated 6 months ago
zhouyiji / MIGE
View on GitHub
Mutual Information Gradient Estimation
☆12Sep 8, 2021Updated 4 years ago