fla-org / fla-zoo
Flash-Linear-Attention models beyond language
☆11Updated this week
Alternatives and similar repositories for fla-zoo:
Users that are interested in fla-zoo are comparing it to the libraries listed below
- ☆30Updated 10 months ago
- Triton implement of bi-directional (non-causal) linear attention☆46Updated 2 months ago
- Open-sourcing code associated with the AAAI-25 paper "On the Expressiveness and Length Generalization of Selective State-Space Models on …☆12Updated 2 months ago
- ☆19Updated last month
- ☆22Updated last year
- continous batching and parallel acceleration for RWKV6☆24Updated 9 months ago
- ☆39Updated last month
- Stick-breaking attention☆52Updated last month
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆26Updated last year
- Transformers components but in Triton☆32Updated last month
- [ICML 2023] "Data Efficient Neural Scaling Law via Model Reusing" by Peihao Wang, Rameswar Panda, Zhangyang Wang☆14Updated last year
- ☆19Updated 3 months ago
- Implementation of the model "Hedgehog" from the paper: "The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry"☆13Updated last year
- Xmixers: A collection of SOTA efficient token/channel mixers☆11Updated 5 months ago
- ☆14Updated 2 years ago
- Here we will test various linear attention designs.☆60Updated 11 months ago
- LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification☆45Updated last month
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆30Updated 10 months ago
- Self Reproduction Code of Paper "Reducing Transformer Key-Value Cache Size with Cross-Layer Attention (MIT CSAIL)☆12Updated 10 months ago
- ☆17Updated last week
- The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.☆45Updated 6 months ago
- FlexAttention w/ FlashAttention3 Support☆26Updated 6 months ago
- triton ver of gqa flash attn, based on the tutorial☆11Updated 8 months ago
- [ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"☆91Updated last week
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆37Updated 6 months ago
- Open-Pandora: On-the-fly Control Video Generation☆34Updated 4 months ago
- Awesome Triton Resources☆24Updated 2 weeks ago
- Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719☆22Updated 10 months ago
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)☆26Updated last week
- ☆21Updated last year