WeianMao/triattention

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/WeianMao/triattention)

WeianMao / triattention

TriAttention — Efficient long reasoning with trigonometric KV cache compression. Enables OpenClaw local deployment on memory-constrained GPUs.

☆826

Alternatives and similar repositories for triattention

Users that are interested in triattention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

aim-uofa / GSI-Bench
View on GitHub
[CVPR2026] Exploring Spatial Intelligence from a Generative Perspective
☆30Jun 3, 2026Updated last month
ziplab / DDP
View on GitHub
[CVPRW 2026 Oral] Less Detail, Better Answers: Degradation-Driven Prompting for VQA
☆20Apr 25, 2026Updated 2 months ago
aim-uofa / OmniJigsaw
View on GitHub
☆34Apr 10, 2026Updated 3 months ago
alibaba-damo-academy / WorldOlympiad
View on GitHub
WorldOlympiad: Can Your World Model Survive a Triathlon?
☆54Updated this week
z-lab / paroquant
View on GitHub
[ICLR 2026] ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
☆325Jul 1, 2026Updated 2 weeks ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
domvox / triattention-ggml
View on GitHub
Frequency-based KV cache pruning for llama.cpp — 25% cache reduction, improved PPL at long context. GPU compaction kernel for HIP/ROCm.
☆17Apr 18, 2026Updated 3 months ago
liranringel / ddtree
View on GitHub
☆389Apr 16, 2026Updated 3 months ago
aim-uofa / STAIR
View on GitHub
☆18Jun 13, 2026Updated last month
aim-uofa / ReasonMatch
View on GitHub
[CVPR2026] Eliciting Complex Spatial Reasoning in MLLMs through Wide-Baseline Matching
☆19Jun 4, 2026Updated last month
z-lab / dflash
View on GitHub
DFlash: Block Diffusion for Flash Speculative Decoding
☆5,500May 10, 2026Updated 2 months ago
aim-uofa / StaMo
View on GitHub
Unsupervised Learning of Generalizable Robot Motion from Compact State Representation
☆40Jun 10, 2026Updated last month
ziplab / Pyramid-Sparse-Attention
View on GitHub
Official PyTorch implementation of [PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation](https://arxiv.org/abs…
☆25Jan 25, 2026Updated 5 months ago
microsoft / World-R1
View on GitHub
[ICML 2026] World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
☆409Jun 3, 2026Updated last month
microsoft / LatentSpatialMemory
View on GitHub
Latent Spatial Memory for Video World Models
☆274Updated this week
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
aim-uofa / dLLM-MidTruth
View on GitHub
[ICLR'26] Official PyTorch implementation of "Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models".
☆66Mar 5, 2026Updated 4 months ago
QwenLM / FlashQLA
View on GitHub
high-performance linear attention kernel library built on TileLang
☆597Updated this week
scrya-com / rotorquant
View on GitHub
KV cache compression via block-diagonal rotation. Beats TurboQuant: better PPL (6.91 vs 7.07), 28% faster decode, 5.3x faster prefill, 44…
☆1,037Apr 23, 2026Updated 2 months ago
thu-ml / SpargeAttn
View on GitHub
[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.
☆1,017Feb 25, 2026Updated 4 months ago
IST-DASLab / Quartet-II
View on GitHub
Quartet II Official Code
☆76May 1, 2026Updated 2 months ago
NVlabs / LongLive
View on GitHub
Long Video Gen Infrastructure
☆2,479Updated this week
Luce-Org / lucebox
View on GitHub
Fast LLM speculative inference server for consumer hardware.
☆2,668Updated this week
RightNow-AI / TIDE
View on GitHub
Dynamic per-token early exit for LLM inference. Skip layers tokens don't need
☆33Mar 18, 2026Updated 4 months ago
aim-uofa / EvoTokenDLM
View on GitHub
[ACL'26] EvoToken-DLM (Beyond Hard Masks: Progressive Token Evolution for Diffusion Language)
☆48Apr 7, 2026Updated 3 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
NVIDIA / kvpress
View on GitHub
LLM KV cache compression made easy
☆1,142Jul 9, 2026Updated last week
ThisisBillhe / ZipAR
View on GitHub
[ICML 2025] This is the official PyTorch implementation of "ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality…
☆51Mar 25, 2025Updated last year
NVlabs / Fast-dLLM
View on GitHub
Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"
☆1,063May 30, 2026Updated last month
qixinhu11 / LongLive-RAG
View on GitHub
Official Implementation of LongLive-RAG: A general retrieval-augmented framework for long video generation.
☆97Jun 4, 2026Updated last month
NVlabs / QeRL
View on GitHub
[ICLR 2026]QeRL enables RL for 32B LLMs on a single H100 GPU.
☆511Mar 30, 2026Updated 3 months ago
thu-ml / SageAttention
View on GitHub
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-t…
☆3,493Jan 17, 2026Updated 6 months ago
adamzweiger / compaction
View on GitHub
Algorithms for latent compaction
☆257Apr 22, 2026Updated 2 months ago
aim-uofa / AGILE
View on GitHub
☆46May 6, 2026Updated 2 months ago
DevTechJr / turboquant-gpu
View on GitHub
☆259Apr 5, 2026Updated 3 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
KlingAIResearch / MemFlow
View on GitHub
Official Implementation of "MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives"
☆214Dec 29, 2025Updated 6 months ago
Dao-AILab / sonic-moe
View on GitHub
Accelerating MoE with IO and Tile-aware Optimizations
☆732Jul 4, 2026Updated 2 weeks ago
lightseekorg / tokenspeed
View on GitHub
TokenSpeed is a speed-of-light LLM inference engine.
☆1,638Updated this week
czg1225 / DMax
View on GitHub
DMax: Aggressive Parallel Decoding for dLLMs
☆127Jul 5, 2026Updated 2 weeks ago
RightNow-AI / autokernel
View on GitHub
Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.
☆1,469Mar 19, 2026Updated 4 months ago
tanishqkumar / ssd
View on GitHub
A lightweight inference engine supporting speculative speculative decoding (SSD).
☆970May 10, 2026Updated 2 months ago
NVlabs / Long-RL
View on GitHub
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
☆726Sep 24, 2025Updated 9 months ago