hetailang/SqueezeAttention

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/hetailang/SqueezeAttention)

hetailang / SqueezeAttention

☆37

Alternatives and similar repositories for SqueezeAttention

Users that are interested in SqueezeAttention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Zoeyyao27 / SirLLM
View on GitHub
This repository contains the code for the paper: SirLLM: Streaming Infinite Retentive LLM
☆60May 28, 2024Updated 2 years ago
open-compass / Ada-LEval
View on GitHub
The official implementation of "Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks"
☆56May 22, 2025Updated last year
graphcore-research / llm-inference-research
View on GitHub
An experimentation platform for LLM inference optimisation
☆36Sep 19, 2024Updated last year
d-matrix-ai / keyformer-llm
View on GitHub
Keyformer proposes KV Cache reduction through key tokens identification and without the need for fine-tuning
☆57Mar 26, 2024Updated 2 years ago
justarter / E2URec
View on GitHub
Official Code for paper "Towards Efficient and Effective Unlearning of Large Language Models for Recommendation" (Frontiers of Computer S…
☆38Jul 19, 2024Updated 2 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
ClubieDong / QAQ-KVCacheQuantization
View on GitHub
QAQ: Quality Adaptive Quantization for LLM KV Cache
☆55Mar 27, 2024Updated 2 years ago
git-disl / Virus
View on GitHub
This is the official code for the paper "Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation"
☆56Feb 2, 2025Updated last year
GATECH-EIC / ShiftAddViT
View on GitHub
[NeurIPS 2023] ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer
☆30Dec 6, 2023Updated 2 years ago
tianyi-lab / C3PO
View on GitHub
[COLM 2025] "C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing"
☆21Apr 9, 2025Updated last year
VITA-Group / Q-Hitter
View on GitHub
☆15Jun 4, 2024Updated 2 years ago
zyxxmu / cam
View on GitHub
Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inference
☆50Jun 19, 2024Updated 2 years ago
machilusZ / FastGen
View on GitHub
This repo contains the source code for: Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
☆44Aug 14, 2024Updated last year
FasterDecoding / SnapKV
View on GitHub
☆327Jul 10, 2025Updated last year
FMInference / H2O
View on GitHub
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
☆530Aug 1, 2024Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
mwatkins1970 / SAE_Feature_Interpretability_Tool
View on GitHub
A tool to assist in the interpretation of learned features in sparse autoencoders (in particular the four SAE's trained by Joseph Bloom o…
☆19Oct 4, 2024Updated last year
shuzhangzhong / HybriMoE-Preview
View on GitHub
☆17Apr 9, 2025Updated last year
ztxtech / Time-Evidence-Fusion-Network
View on GitHub
Official implementation of "Time Evidence Fusion Network: Multi-source View in Long-Term Time Series Forecasting" (https://arxiv.org/abs/…
☆104Apr 22, 2025Updated last year
ConiferLM / Conifer
View on GitHub
Conifer: Improving Complex Constrained Instruction-Following Ability of Large Language Models
☆91Apr 4, 2024Updated 2 years ago
DRSY / EasyKV
View on GitHub
Easy control for Key-Value Constrained Generative LLM Inference(https://arxiv.org/abs/2402.06262)
☆62Feb 13, 2024Updated 2 years ago
shenao-zhang / reward-augmented-preference
View on GitHub
The official implementation of Preference Data Reward-Augmentation.
☆18May 1, 2025Updated last year
nttmdlab-nlp / InstructDoc
View on GitHub
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions (AAAI2024)
☆162May 31, 2024Updated 2 years ago
prs-eth / LoRA-Ensemble
View on GitHub
LoRA-Ensemble: Efficient Uncertainty Modelling for Self-attention Networks
☆55Mar 7, 2026Updated 4 months ago
CERT-Lab / lora-sb
View on GitHub
Initialization using Update Approximation is a Silver Bullet for Extremely Efficient Low-Rank Fine-Tuning
☆52Oct 17, 2025Updated 9 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
Infini-AI-Lab / TriForce
View on GitHub
[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
☆281Aug 31, 2024Updated last year
schwartz-lab-NLP / TOVA
View on GitHub
Token Omission Via Attention
☆131Oct 13, 2024Updated last year
VILA-Lab / DELT
View on GitHub
(CVPR 2025) Official implementation to DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation which outperforms SOTA…
☆28Aug 23, 2025Updated 11 months ago
moucheng2017 / SOP-LVM-ICL-Ensemble
View on GitHub
[NeurIPS VLM workshop 2024] In-Context Ensemble Learning from Pseudo Labels Improves Video-Language Models for Low-Level Workflow Underst…
☆23Mar 16, 2025Updated last year
ZBox1005 / CoT-UQ
View on GitHub
[ACL 2025] "CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought"
☆17Apr 3, 2025Updated last year
SqueezeAILab / SqueezedAttention
View on GitHub
[ACL 2025] Squeezed Attention: Accelerating Long Prompt LLM Inference
☆58Nov 20, 2024Updated last year
YujieLu10 / Seeker
View on GitHub
☆11May 24, 2024Updated 2 years ago
prateeky2806 / ComPEFT
View on GitHub
☆26Nov 23, 2023Updated 2 years ago
facebookresearch / ZeroSumEval
View on GitHub
A framework for pitting LLMs against each other in an evolving library of games ⚔
☆35Apr 20, 2025Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
PootieT / explain-then-translate
View on GitHub
Official repo for EMNLP 2023 paper "Explain-then-Translate: An Analysis on Improving Program Translation with Self-generated Explanations…
☆29Dec 5, 2023Updated 2 years ago
annosubmission / GRC-Cache
View on GitHub
☆16Mar 13, 2023Updated 3 years ago
IST-DASLab / peft-rosa
View on GitHub
A fork of the PEFT library, supporting Robust Adaptation (RoSA)
☆15Aug 16, 2024Updated last year
XiaoduoAILab / XmodelLM
View on GitHub
XmodelLM
☆38Nov 19, 2024Updated last year
yangjianxin1 / LongQLoRA
View on GitHub
LongQLoRA: Extent Context Length of LLMs Efficiently
☆170Nov 12, 2023Updated 2 years ago
giangdip2410 / HyperRouter
View on GitHub
Code for this paper "HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts via HyperNetwork"
☆33Nov 29, 2023Updated 2 years ago
r-three / realistic_evaluation_of_model_merging_for_compositional_generalization
View on GitHub
☆13Feb 11, 2026Updated 5 months ago