Yifei-Zuo/Flash-LLA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Yifei-Zuo/Flash-LLA)

Yifei-Zuo / Flash-LLA

Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regression

☆23

Alternatives and similar repositories for Flash-LLA

Users that are interested in Flash-LLA are comparing it to the libraries listed below

Sorting:

HydraQYH / hp_rms_norm
View on GitHub
High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)
☆30Jan 22, 2026Updated last month
IST-DASLab / gemm-fp8
View on GitHub
High Performance FP8 GEMM Kernels for SM89 and later GPUs.
☆20Jan 24, 2025Updated last year
zhuzilin / flash-attention-with-sink
View on GitHub
☆38Aug 7, 2025Updated 7 months ago
Doraemonzzz / xmixers
View on GitHub
Xmixers: A collection of SOTA efficient token/channel mixers
☆28Sep 4, 2025Updated 6 months ago
tile-ai / AttentionEngine
View on GitHub
☆52May 19, 2025Updated 9 months ago
serdes21 / flashtile
View on GitHub
FlashTile is a CUDA Tile IR compiler that is compatible with NVIDIA's tileiras, targeting SM70 through SM121 NVIDIA GPUs.
☆54Feb 6, 2026Updated last month
KuangjuX / AttnLink
View on GitHub
An experimental communicating attention kernel based on DeepEP.
☆35Jul 29, 2025Updated 7 months ago
WaveSpeedAI / QuantumAttention
View on GitHub
[WIP] Better (FP8) attention for Hopper
☆32Feb 24, 2025Updated last year
lixiaoyu2000 / HAT
View on GitHub
Official Repo For AAAI 2026 Accepted Paper "Rethinking the Spatio-Temporal Alignment of End-to-End 3D Perception"
☆29Jan 13, 2026Updated last month
lucasjinreal / LLaVA-Magvit2
View on GitHub
LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.
☆39Jun 20, 2024Updated last year
richardodliu / OpenCodeEval
View on GitHub
☆50Aug 21, 2025Updated 6 months ago
JaesungHuh / ca-subtitle
View on GitHub
Implementation of "Look, Listen and Recognise:character-aware audio-visual subtitling"
☆19Nov 3, 2025Updated 4 months ago
caiqizh / LUQ
View on GitHub
☆13Jan 14, 2026Updated last month
stockeh / mlx-trm
View on GitHub
MLX Implementation of Recursive Reasoning with Tiny Networks
☆78Oct 11, 2025Updated 4 months ago
LGAI-Research / SetR
View on GitHub
☆20Sep 11, 2025Updated 5 months ago
FreedomIntelligence / MTalk-Bench
View on GitHub
MTalk-Bench: Evaluating Speech-to-Speech Models in Multi-Turn Dialogues via Arena-style and Rubrics Protocols
☆17Nov 19, 2025Updated 3 months ago
speechLabBcCuny / EDANSA
View on GitHub
The Ecoacoustic Dataset from Arctic North Slope Alaska
☆11May 29, 2025Updated 9 months ago
NARUTO-2024 / WavBench
View on GitHub
WavBench: Benchmarking Reasoning, Colloquialism, and Paralinguistics for End-to-End Spoken Dialogue Models
☆27Feb 13, 2026Updated 3 weeks ago
sgl-project / sgl-cookbook
View on GitHub
Cookbook of SGLang - Recipe
☆87Updated this week
AniZpZ / AutoSmoothQuant
View on GitHub
An easy-to-use package for implementing SmoothQuant for LLMs
☆111Apr 7, 2025Updated 11 months ago
carsonpo / quadmul
View on GitHub
a fast and customizable CUDA int4 tensor core gemm
☆15Aug 2, 2024Updated last year
htshinichi / onnx-yolov3
View on GitHub
use yolov3 onnx model to implement object detection
☆11Apr 25, 2019Updated 6 years ago
Lucasc-99 / Actor-Critic
View on GitHub
The A2C Reinforcement Learning Algorithm in Pytorch
☆16May 13, 2024Updated last year
MoFHeka / execution-ucx
View on GitHub
A std::execution style runtime context and High Performance RPC Transport for using OpenUCX. Including CUDA/ROCM/... devices with RDMA.
☆29Feb 22, 2026Updated last week
BrightenWu / Qt-BgMatte
View on GitHub
The C++ matting code is based on BackgroundMattingV2 and RobustVideoMatting.
☆11Nov 20, 2021Updated 4 years ago
a-hamdi / native-sparse-attention
View on GitHub
☆15Feb 23, 2025Updated last year
sangminwoo / awesome-token-redundancy-reduction
View on GitHub
😎 Awesome papers on token redundancy reduction
☆11Mar 12, 2025Updated 11 months ago
ChoiDM / PIPnet
View on GitHub
Implemetation of "Pixel-In-Pixel Net: Towards Efficient Facial Landmark Detection in the Wild"
☆11Jul 6, 2023Updated 2 years ago
LuluW8071 / Automatic-Speech-Recognition-with-PyTorch
View on GitHub
Real-Time ASR with CNN-BiLSTM: End-to-End Live Streaming Using PyTorch Lightning⚡
☆11Jan 23, 2025Updated last year
shyhoom / T22_034_CRDDC_2022_SourceCode
View on GitHub
T22_034_han_shi_hao_CRDDC_2022_SourceCode
☆11Dec 29, 2023Updated 2 years ago
ACA-Lab-SJTU / token-ring
View on GitHub
☆13Jan 7, 2025Updated last year
GeeeekExplorer / 3d-parallel-demo
View on GitHub
使用torch.distributed实现DP/TP/PP
☆13Dec 28, 2023Updated 2 years ago
GAIR-NLP / DatasetResearch
View on GitHub
DatasetResearch: Benchmarking Agent Systems for Demand-Driven Dataset Discovery
☆20Sep 24, 2025Updated 5 months ago
Exgc / OmniSep
View on GitHub
Sound Separation, Omni modal
☆28Sep 15, 2025Updated 5 months ago
indigo-99 / FuRPE
View on GitHub
☆14Dec 20, 2022Updated 3 years ago
wangzhaode / tokenizer.cpp
View on GitHub
A lightweight, production-ready C++ library for LLM tokenization, fully compatible with HuggingFace tokenizer.json.
☆24Jan 4, 2026Updated 2 months ago
yuxiang-gao / awesome-llm-blogs
View on GitHub
Blogs that I'm actively following.
☆13Sep 17, 2023Updated 2 years ago
simveit / persistent_dense_gemm
View on GitHub
Persistent dense gemm for Hopper in `CuTeDSL`
☆15Aug 9, 2025Updated 6 months ago
Immortalin / Simulacra
View on GitHub
Simple and Ideal Circuit Simulation
☆13Dec 4, 2017Updated 8 years ago