Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regression
☆23Oct 1, 2025Updated 5 months ago
Alternatives and similar repositories for Flash-LLA
Users that are interested in Flash-LLA are comparing it to the libraries listed below
Sorting:
- High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)☆30Jan 22, 2026Updated last month
- High Performance FP8 GEMM Kernels for SM89 and later GPUs.☆20Jan 24, 2025Updated last year
- ☆38Aug 7, 2025Updated 7 months ago
- Xmixers: A collection of SOTA efficient token/channel mixers☆28Sep 4, 2025Updated 6 months ago
- ☆52May 19, 2025Updated 9 months ago
- FlashTile is a CUDA Tile IR compiler that is compatible with NVIDIA's tileiras, targeting SM70 through SM121 NVIDIA GPUs.☆54Feb 6, 2026Updated last month
- An experimental communicating attention kernel based on DeepEP.☆35Jul 29, 2025Updated 7 months ago
- [WIP] Better (FP8) attention for Hopper☆32Feb 24, 2025Updated last year
- Official Repo For AAAI 2026 Accepted Paper "Rethinking the Spatio-Temporal Alignment of End-to-End 3D Perception"☆29Jan 13, 2026Updated last month
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆39Jun 20, 2024Updated last year
- ☆50Aug 21, 2025Updated 6 months ago
- Implementation of "Look, Listen and Recognise:character-aware audio-visual subtitling"☆19Nov 3, 2025Updated 4 months ago
- ☆13Jan 14, 2026Updated last month
- MLX Implementation of Recursive Reasoning with Tiny Networks☆78Oct 11, 2025Updated 4 months ago
- ☆20Sep 11, 2025Updated 5 months ago
- MTalk-Bench: Evaluating Speech-to-Speech Models in Multi-Turn Dialogues via Arena-style and Rubrics Protocols☆17Nov 19, 2025Updated 3 months ago
- The Ecoacoustic Dataset from Arctic North Slope Alaska☆11May 29, 2025Updated 9 months ago
- WavBench: Benchmarking Reasoning, Colloquialism, and Paralinguistics for End-to-End Spoken Dialogue Models☆27Feb 13, 2026Updated 3 weeks ago
- Cookbook of SGLang - Recipe☆87Updated this week
- An easy-to-use package for implementing SmoothQuant for LLMs☆111Apr 7, 2025Updated 11 months ago
- a fast and customizable CUDA int4 tensor core gemm☆15Aug 2, 2024Updated last year
- use yolov3 onnx model to implement object detection☆11Apr 25, 2019Updated 6 years ago
- The A2C Reinforcement Learning Algorithm in Pytorch☆16May 13, 2024Updated last year
- A std::execution style runtime context and High Performance RPC Transport for using OpenUCX. Including CUDA/ROCM/... devices with RDMA.☆29Feb 22, 2026Updated last week
- The C++ matting code is based on BackgroundMattingV2 and RobustVideoMatting.☆11Nov 20, 2021Updated 4 years ago
- ☆15Feb 23, 2025Updated last year
- 😎 Awesome papers on token redundancy reduction☆11Mar 12, 2025Updated 11 months ago
- Implemetation of "Pixel-In-Pixel Net: Towards Efficient Facial Landmark Detection in the Wild"☆11Jul 6, 2023Updated 2 years ago
- Real-Time ASR with CNN-BiLSTM: End-to-End Live Streaming Using PyTorch Lightning⚡☆11Jan 23, 2025Updated last year
- T22_034_han_shi_hao_CRDDC_2022_SourceCode☆11Dec 29, 2023Updated 2 years ago
- ☆13Jan 7, 2025Updated last year
- 使用torch.distributed实现DP/TP/PP☆13Dec 28, 2023Updated 2 years ago
- DatasetResearch: Benchmarking Agent Systems for Demand-Driven Dataset Discovery☆20Sep 24, 2025Updated 5 months ago
- Sound Separation, Omni modal☆28Sep 15, 2025Updated 5 months ago
- ☆14Dec 20, 2022Updated 3 years ago
- A lightweight, production-ready C++ library for LLM tokenization, fully compatible with HuggingFace tokenizer.json.☆24Jan 4, 2026Updated 2 months ago
- Blogs that I'm actively following.☆13Sep 17, 2023Updated 2 years ago
- Persistent dense gemm for Hopper in `CuTeDSL`☆15Aug 9, 2025Updated 6 months ago
- Simple and Ideal Circuit Simulation☆13Dec 4, 2017Updated 8 years ago