metacarbon/shareAtt

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/metacarbon/shareAtt)

metacarbon / shareAtt

Beyond KV Caching: Shared Attention for Efficient LLMs

☆20

Alternatives and similar repositories for shareAtt

Users that are interested in shareAtt are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yangyifei729 / KVSharer
View on GitHub
Source code of paper ''KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing''
☆31Oct 24, 2024Updated last year
Some-random / theorem-proving-reasoning
View on GitHub
Code for the paper LeanReasoner: Boosting Complex Logical Reasoning with Lean: https://arxiv.org/pdf/2403.13312.pdf
☆27May 25, 2024Updated 2 years ago
vllm-project / tml-fa4
View on GitHub
FA4-based Relative Attention Kernel developed by TML and Colfax
☆17Updated this week
iankur / vqllm
View on GitHub
Residual vector quantization for KV cache compression in large language model
☆12Oct 22, 2024Updated last year
zxytim / arithmetic-encoding-compression
View on GitHub
☆11Apr 3, 2023Updated 3 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
linxihui / dkernel
View on GitHub
☆22Apr 17, 2025Updated last year
OSU-NLP-Group / reversal-curse-binding
View on GitHub
☆25Apr 3, 2025Updated last year
iLearn-Lab / ACL25-PTQ1.61
View on GitHub
☆15Apr 6, 2026Updated 3 months ago
diegoantognini / GameWikiSum
View on GitHub
Video Games Dataset for Multi-Document Summarization
☆20Sep 20, 2025Updated 10 months ago
fpgasystems / fpga-hyperloglog
View on GitHub
FPGA-based HyperLogLog Accelerator
☆12Jul 13, 2020Updated 6 years ago
TiledTensor / TiledBench
View on GitHub
Benchmark tests supporting the TiledCUDA library.
☆19Nov 19, 2024Updated last year
JerryYin777 / Cross-Layer-Attention
View on GitHub
Self Reproduction Code of Paper "Reducing Transformer Key-Value Cache Size with Cross-Layer Attention (MIT CSAIL)
☆17May 24, 2024Updated 2 years ago
VITA-Group / llm-kick
View on GitHub
[ICLR 2024] Jaiswal, A., Gan, Z., Du, X., Zhang, B., Wang, Z., & Yang, Y. Compressing llms: The truth is rarely pure and never simple.
☆27Apr 21, 2025Updated last year
rsshyam / GRPO
View on GitHub
☆71Jul 28, 2024Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
WaveSpeedAI / QuantumAttention
View on GitHub
[WIP] Better (FP8) attention for Hopper
☆33Feb 24, 2025Updated last year
jy-yuan / KIVI
View on GitHub
[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
☆419Nov 20, 2025Updated 8 months ago
HaoKang-Timmy / LatencySensitiveBench
View on GitHub
First Latency-Aware Competitive LLM Agent Benchmark
☆32Jun 3, 2025Updated last year
VITA-Group / instant_soup
View on GitHub
[ICML2023] Instant Soup Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models. Ajay Jaiswal, Shiwei Liu, Ti…
☆11Nov 28, 2023Updated 2 years ago
chijames / KERPLE
View on GitHub
☆20Oct 25, 2022Updated 3 years ago
zhuzilin / flash-attention-with-sink
View on GitHub
☆37Aug 7, 2025Updated 11 months ago
thunlp / Modularity-Analysis
View on GitHub
[ACL 2023 Findings] Emergent Modularity in Pre-trained Transformers
☆26Jun 7, 2023Updated 3 years ago
TiledTensor / TiledLower
View on GitHub
TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.
☆13Nov 23, 2024Updated last year
Infini-AI-Lab / TriForce
View on GitHub
[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
☆281Aug 31, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
NVIDIA / nvidia-dlfw-inspect
View on GitHub
The tool facilitates debugging convergence issues and testing new algorithms and recipes for training LLMs using Nvidia libraries such as…
☆21Sep 17, 2025Updated 10 months ago
Zcchill / Value-Residual-Learning
View on GitHub
☆15Mar 20, 2025Updated last year
joonkeekim / Instructive-Decoding
View on GitHub
Official repository of "Distort, Distract, Decode: Instruction-Tuned Model Can Refine its Response from Noisy Instructions", ICLR 2024 Sp…
☆21Mar 7, 2024Updated 2 years ago
gccnlp / Light-PEFT
View on GitHub
[ACL 2024 Findings] Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning
☆13Sep 2, 2024Updated last year
sade-adrien / SteloCoder
View on GitHub
☆16Dec 21, 2023Updated 2 years ago
hexiaoxiao-cs / DICE
View on GitHub
☆16May 10, 2026Updated 2 months ago
JoeYing1019 / UltraTool
View on GitHub
[ACL2024] Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios
☆71Aug 5, 2025Updated 11 months ago
liblaf / ilatex
View on GitHub
📚 LaTeX templates and tools for creating beautiful, structured documents 📝
☆14Oct 24, 2025Updated 8 months ago
FasterDecoding / SnapKV
View on GitHub
☆324Jul 10, 2025Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
chhzh123 / ptc-tutorial
View on GitHub
PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo
☆17Mar 13, 2023Updated 3 years ago
leezythu / FocusLLM
View on GitHub
FocusLLM: Scaling LLM’s Context by Parallel Decoding
☆45Dec 8, 2024Updated last year
AgentForceTeamOfficial / UA2-Agent
View on GitHub
Official Implementation of UA^{2}-Agent and other baseline algorithms of "Towards Unified Alignment Between Agents, Humans, and Environme…
☆19Nov 12, 2024Updated last year
linfeng93 / BiTA
View on GitHub
An innovative method expediting LLMs via streamlined semi-autoregressive generation and draft verification.
☆28Apr 15, 2025Updated last year
BaichuanSEED / BaichuanSEED.github.io
View on GitHub
Official Repository for Paper "BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Compet…
☆18Aug 28, 2024Updated last year
gmlwns2000 / sea-attention
View on GitHub
Official Implementation of SEA: Sparse Linear Attention with Estimated Attention Mask (ICLR 2024)
☆12Jun 20, 2025Updated last year
vllm-project / vllm-nccl
View on GitHub
Manages vllm-nccl dependency
☆18Jun 3, 2024Updated 2 years ago